next up previous
Next: Learning Classification Functions Up: Object Classification Previous: Object Classification

Feature Detection using Integral Images


There are many motivations for using features rather than pixels directly. For mobile robots, a critical motivation is that feature based systems operate much faster than pixel based systems [#!Viola_2001!#]. The features used here have the same structure as the Haar basis functions, i.e., step functions introduced by Alfred Haar to define wavelets [#!Haar_1910!#]. They are also used in [#!Lienhart_2003_1!#,#!Lienhart_2002!#,#!Papageorgio_1998!#,#!Viola_2001!#]. Figure 2 (left) shows the six basis features, i.e., edge, line, and center surround features. The base resolution of the object detector is $20 \times 40$ pixels, thus the set of possible features in this area is very large (361760 features). In contrast to the Haar basis function, the set of rectangle features is not minimal. A single feature is effectively computed on input images using integral images [#!Viola_2001!#], also known as summed area tables [#!Lienhart_2003_1!#,#!Lienhart_2002!#]. An integral image $I$ is an intermediate representation for the image and contains the sum of gray scale pixel values of image $N$ with height $y$ and width $x$, i.e.,

\begin{eqnarray*}
I(x,y) = \sum_{x'=0}^{x}\sum_{y'=0}^{y} N(x',y').
\end{eqnarray*}

The integral image is computed recursively, by the formulas: $I(x,y) = I(x,y-1) + I(x-1,y) + N(x,y) - I(x-1,y-1)$ with $I(-1,y) = I(x,-1) = 0$, therefore requiring only one scan over the input data. This intermediate representation $I(x,y)$ allows the computation of a rectangle feature value at $(x,y)$ with height and width $(h,w)$ using four references (see Figure 2 (right)):

\begin{eqnarray*}
F(x,y,h,w) & = & I(x,y) + I(x+w,y+h) - I(x,y+h) - I(x+w,y).
\end{eqnarray*}

Since the features are a composition of rectangles, they are computed with several lookups and subtractions weighted with the area of the black and white rectangles. To detect a feature, a threshold is required. This threshold is automatically determined during a fitting process, such that a minimum number of examples are misclassified. The examples are given in a set of images that are classified as positive or negative samples. The set is also used in the learning phase that is briefly described next.



next up previous
Next: Learning Classification Functions Up: Object Classification Previous: Object Classification
root 2004-03-04