Common approaches of object detection use information of CCD-cameras that provide a view of the robot's environment. Nevertheless, cameras are difficult to use in natural environments with changing light conditions. Robot control architectures that include robot vision mainly rely on tracking, e.g., distinctive, local, scale invariant features [18], light sources [11] or the ceilings [5]. Other camera-based approaches to robot vision, e.g., stereo cameras and structure from motion, have difficulties providing navigation information for a mobile robot in real-time. Camera-based systems have problems localizing objects precisely, i.e., single cameras estimate the object distance only roughly using the known object size due to the resolution. Estimating depth with stereo is imprecise either: For robots, the width of the stereo base line is limited to small values (e.g., 20 cm), resulting in a typical -axis error of about cm for objects at the scanner's maximum ranging distance of about 8 m.
Many current successful robots are equipped with distance sensors, mainly 2D laser range finders [24]. 2D scanners cannot detect 3D obstacles outside their scan plane. Currently a general trend exists to use 3D laser range finders and build 3D maps [2,19,22,23]. Nevertheless, only little work has been done in interpreting the obtained 3D models. In [14] we show how complete scenes, made of several automatically registered 3D scans, are labeled using relations given in a semantic net. Object detection in 3D laser scans from mobile robots was presented in [13]. This approach is extended here: First, CARTs are used for a more sophisticated object detection, second, objects are localized in 3D space using point based matching, and third, the accuracy of the matching is evaluated.
In the area of object recognition and classification in 3D range data, Johnson and Hebert use the well-known ICP algorithm [4] for registering 3D shapes into a common coordinate system [10]. The necessary initial guess of the ICP algorithm is done by detecting the object with spin images [10]. This approach was extended by Shapiro et al. [16]. In contrast to our proposed method, both approaches use local, memory consuming surface signatures based on prior created mesh representations of the objects. Furthermore, spin images are not able to model complicated objects, i.e., objects with non-smooth, or non-producible mesh representation. One of the objects used in this paper, the volksbot [1], is of such a structure (Fig. ).
Besides spin images, several surface representation schemes are in use for computing an initial alignment. Stein and Medioni presented the notion of ``splash'' to represent the normals along a geodesic circle of a center point, which is the local Gauss map for 3D object recognition with a database [20]. Ashrock et al. proposed a pairwise geometric histogram to find corresponding facets between two surfaces that are represented by triangle meshes [3]. Harmonic maps and their use in surface matching have been used by Zhang and Hebert [26]. Recently, Sun and colleagues have suggested so-called ``point fingerprints'': They compute a set of 2D contours that are projections of geodesic circles onto the tangent plane and compute similarities between them [21]. All these approaches take the local geometry of the surfaces into account, i.e., meshes. They have problems coping with unstructured point clouds.