Approach of V.S. Petrovic, T.F. Cootes [19]

This paper is closer to our work because it tries to classify into car makes. Car make here is e.g.: Peugot 406, Ford Puma, Mercedes Class A, etc.

Recognition Process Description.

The system proposed in this paper is based on the principle of locating, extracting and recognising normalized structure samples taken from a reference image patch on the front of the vehicle (figure 2.1 on page ). The process starts with locating a reference segment on the object (in this case number-plate) and defining a Region of Interest (RoI) relative to it. The RoI is processed by the feature extraction element to define a normalized sample of the structure within it. The structure is expressed in a feature vector of pre-defined length that is representative for the vehicle identity. Finally, simple nearest neighbour classification is used to determine the vehicle type associated with each vector.

**Figure 2.1:** Recognition process presented in [19]
$\includegraphics[width=100mm,height=14mm]{Petrovic-process.eps}$

RoI Detection.

The location and scale of a reference structure on the object defines a reference frame for the region of interest to be sampled. An RoI defined relative to the number-plate is thus independent on the actual location and scale of the vehicle in the image. Number-plates are assumed to be highly regular rectangles. To locate then the system in this paper finds all possible right-angle corners suing suitably tuned, separable gradient filters. A hierarchical algorithm for aggregation of corner points into valid rectangular constallations is used to generate hypotheses for the plate location in the image. A number of scale and aspect constraints are used to remove unsuitable candidates, many caused by the characters on the plate and regular vehicle structure features, and of the remaining candidates the one with best corner structure to fit to each of its corners is chosen. The Region of Interest(RoI) is defined relative to the number-plate coordinates. the region of $\pm 1.3w$ in width and from the number-plate center is used, where is the number-plate width (figure 2.2 on page ).

**Figure 2.2:** The image normalization in [19] was done using this geometry (the image is from our database)
$\includegraphics[width=80mm,height=30mm]{geometry-used-Petrovic.eps}$

**Figure 2.3:** Feature Extraction in [19]
$\includegraphics[width=120mm,height=40mm]{PetrovicFeatureExtraction.eps}$

Feature Extraction. Feature extraction from the Region of Interest provides a structure representation used to recognise the object, The approach used in vehicle type recognition is illustrated in figure 2.3 on page . Initially the RoI is down-sampled to a desired fixed resolution NxM (preceded by suitable smoothing). For vehicle images, horizontal resolution is more severily reduced as the structure is signifficantly more redundant in that direction.

Quite a lot of feaure extraction algorithms were used leading to various image representations. All were obtained by performing the given transformation at each pixel except spectrum phase, which is the phase of FFT.

raw image
Sobel edge response
edge orientation $(\arctan {s_x \over s_y})$
direct normalized gradients $({s_x \over {\sqrt{s^2_x + s^2_y}}}, {s_y \over {\sqrt{s^2_x + s^2_y}}})$
locally normalized gradients
square mapped gradients $({{s^2_x - s^2_y} \over {s^2_x + s^2_y}}, {{2s_xs_y} \over {s^2_x + s^2_y}})$
Harris corner response
spectrum phase(FFT) $\phi=\arg FFT(I')$

Gradients are normalized between $(-\pi,+\pi)$ . Further robustness was obtained by restricting the orientation between and $\pi$ . As an alternative to direct mapping PCA was considered to determine a low dimensional, optimal structure representation. In order to improve recognition performance additional normalization of structure samples is used. The goal is to emphasize areas of the rigid structure that exhibits the greatest variation between different classes.

Classification.

Two distance measures were investigated to compare test and registration samples, the dot product $d=1-\mathbf{f}_1^T\mathbf{f}_2$ and the Euclidean distance $d=\vert\mathbf{f}_1 - \mathbf{f}_2\vert$ . The identity of the test sample was then determined using the nearest neighbour rule. These two measures gave similar results, the dot product slightly outperformed the Euclidean distance.

Database Description.

The database contains over 1000 images and 77 different classes. The images are 640x480 color pixels. The distance is in average $\approx$ 1.2m. The camera was not fixed and there is a significant variation in both scale and in-plane rotation ( $\pm 5^{\circ})$

Obtained Results.

The best classification performance with probability of right identification 97.7% was obtained using square mapped gradients. These results were collected using manual RoI detection. When used with automatic car detection system, 93.3% were classified correctly.

Another approach is presented in [20]. Car detection technique based on multi-cues in still outdoor images is presented here. On the bottom level, two area templates based on edge cue and interest points cue are first designed, which can reject most of non-car sub-windows. On the top level, both global strcuture cue and local texture cue are considered. To character the global structure property the odd Gabor moments are introduced and trained by Support Vector Machines (SVM). The multi channels even Gabor based local texture property extracted from corner area is modeled as a Gaussians distribution. The final experiment results show that the integration of global structure property and local texture property si more powerful in discrimination between car and non-car objects and a detection rate of 93% was obtained. The database contains 1000 negative, 550 positive samples and 170 test images.

In [35] an approach is presented to vehicle-class recognition from a video clip. Two concepts are introduced: probes consisting of local 3D curve-groups which when projected into video frames are features for recognizing vehicle classes in video clips; and Bayesian recognition based class probability densities for groups of 3D distances between pairs of 3D probes. The most stable image features for vehicle class recognition appear to be image curves associate with 3D ridges on the vehicle surface. These ridges are mostly those occurring at metal/glass interfaces, two-surface intersections such as back and side, and self occluding contours such as wheel wells or vehicle-body apparent contours, i.e., silhouettes. There are other detectable surface curves, but most do not provide useful discrminatory features, and many of these are clutter, i.e., due to reflections from the somewhat shiny vehicle surface. Models are built and used for the considerable variability that exists in the features used. A Bayesian recognizer is then used for vehicle class recognition from a sequence of frames. The ultimate goal is a recognizer to deal with essentially all classes of civilian vehicless seen from arbitrary directions, at a broad range of distances and under the broad range of lighting ranging from sunny to cloudy. Experiments are run with a small set of classes to prove feasibility. This work uses estimated knowledge of the motion ans position of the vehicle. One way of inferring that information which uses 1D projectivity invariance is indicated. Recognition rate achieved is 88% which is comparable to [20].

In [36] a method for recognizing vehicle classes using computer craphics (CG) is described. In previous work this group developed a vehicle recognition system based on local-feature configuration, which is a generalization of the eigen-window method. This system could recognize one vehicle class very accurately, but there have been limitations in recognizing several classes, when they were quite similar to each other. In this paper[36] the improvements of this recognition system are descibed to distinguish four classes, namely sedan, wagon, mini-van and hatchback. The system requires training images of all target vehicle classes. These training images are easily created using a 3-dimensional computer graphics (3D CG) tool. CG was used as training images since it dispenses with much of the trouble of collecting real training images. Outdoor experimental results have shown that this recognition system can classify in real images with an accuracy of 83% but quite small training and testing set are used: 50 images for training set and 16 images for testing.

The paper [32] describes a method for recognizing the classes of street-parking vehicles. They combine two already developed systems: vehicle recognition system based on local feature configuration, and the other is detecting street-parking vehicles from side-view range images. These two systems are combined here to develop a new system with which it is possible not only count the number of street-parking cars but also to recognize their class of vehicle type such as sedam, wagon, mini-van or so. This system can recognize four vehicle classes: sedan, wagon, mini-van and hatchback. Outdoor experimental results have shown the accuracy of 79%. The database was extended in comparison to [36] to 34 testing images.

In [34] an approach for learning to detect objects in still gray images that is based on a sparse, part-based representation of objects is presented. A vocabulary of information-rich object parts is automatically constructed from a set of sample images of the object class of interest. Images are then represented using parts from this vocabulary, along with spatial relations observed among them. Based on this representation, a learning algorithm was used to learn to detect instances of the object class. The framework developed here can be applied to any object with distinguishable parts in relatively fixed spatial configuration. Experiments shown here reports robustness to partial occlusion and backgound variation. In addition, solutions to several methodological issues that are significant for the research community to be able to evaluate object detection approaches are discussed and offered here.

Kocurek 2007-12-17