Conclusion

We tried to classify vehicles into car makes from the frontal view using SIFT descriptor and its variants. We used two databases: one contained car images and the second one contained truck images. We detected the region of interest from an image sample (section 3.1.1, p.16). From this region of interest features were extracted using SIFT descriptor. We tried 6 SIFT representations with various topologies. The SIFT representations were (section 3.2.1, p.22):

Every SIFT representation had either overlapping tiles or non-overlapping tiles (section 3.2.1, p.23). We tried to classify in the feature space using k-NN and FLD.

The final classifier consists of the SIFT-GradWei (section 3.2.1) and nearest neighbour classifier (section 3.2.3). We succeed with approx. 94% classification rate for car images and approx. 92% classification rate for truck images.

We found that the classification rate is increasing with the increasing number of tiles. From certain number of tiles the classification rate stagnates. (figure 4.4 on page [*] for car images, figure 4.11 on page [*] for truck images). Every SIFT topology had a certain number of bins where the classification rate was the best, we found that SIFT with 15 horizontal square patches, 9 vertical square patches and with 15 bins per square patch and overlapping tiles had the best results for car images database (figure 4.3 on page [*]). For truck images the SIFT had 15 square patches in horizontal direction, 9 square patches in vertical direction and 25 bins per square patch and overlapping tiles (figure 4.9 on page [*]). Another interesting observation made was that Earth Mover's Distance (section 3.2.2, p. 24) performed similarly to Euclidean distance measurment (section 3.2.2, p. 23). The comparison of SIFT variants can be seen in figure 4.5 on page [*] for car images and in figure 4.12 on page [*] for truck images. We can see that the SIFT-GradWei with overlapping tiles always performed best but their results are comparable to each other. The biggest difference is approx. 8% for truck images and 12% for car images. Then we estimated best $k$ for k-NN algorithm. We found that for truck images the classification rate decreases with the increasing $k$ but only a little bit (figure 4.14 on page [*]). For car images the classification rate decreases more rapidly since we had smaller amount of images for class and from certain $k$ it does not make sense to classify using that $k$ (figure 4.7 on page [*]). With FLD we tried to identify the optimal dimension for the projection of feature space. For both cases it was the highest possible: 9 for cars and 6 for trucks. We used two classifications in the FLD transformed feature space: 1-NN and nearest $\mu$. We found that nearest $\mu$ performed better but it was still worth than k-NN.

The sensitivity in 3 areas was evaluated:

We observed that SIFT is almost invariant to image blur figure 5.3 on page [*]. The results that the classification rate is not affected can be seen in figure 5.2 on page [*] for truck images and in figure 5.8 on page [*] for car images. The SIFT is not stable for noise addition with higher values, the orientation distribution becomes random, figure 5.4 on page [*], the classification rate falls down from variance higher 0.1 for truck images (figure 5.5 on page [*]) and from variance higher then 0.05 for car images (figure 5.9 on page [*]). The reason for this difference is that truck images have higher resolution (1280x512) in comparison to car images (256x92). The sensitivity of final classifier to training set reduction was good, for the training set reduction to 60% of its original size we got approx. 14% for car images and approx. 7% for truck images.



Subsections
Kocurek 2007-12-17