We tried to classify vehicles into car makes from the frontal view using SIFT descriptor and its variants. We used two databases: one contained car images and the second one contained truck images. We detected the region of interest from an image sample (section 3.1.1, p.16). From this region of interest features were extracted using SIFT descriptor. We tried 6 SIFT representations with various topologies. The SIFT representations were (section 3.2.1, p.22):
Every SIFT representation had either overlapping tiles or non-overlapping tiles (section 3.2.1, p.23). We tried to classify in the feature space using k-NN and FLD.
The final classifier consists of the SIFT-GradWei (section 3.2.1) and nearest neighbour classifier (section 3.2.3). We succeed with approx. 94% classification rate for car images and approx. 92% classification rate for truck images.
We found that the classification rate is increasing with the increasing number of tiles.
From certain number of tiles the classification rate stagnates.
(figure 4.4 on page for car images, figure 4.11 on page
for truck images). Every SIFT topology
had a certain number of bins where the classification rate was the best, we found that SIFT with 15
horizontal square patches, 9 vertical square patches and with 15 bins per square patch and
overlapping tiles had the best results for car images database (figure 4.3 on page
). For truck images
the SIFT had 15 square patches in horizontal direction, 9 square patches in vertical direction and
25 bins per square patch and overlapping tiles (figure 4.9 on page
). Another interesting
observation made was that Earth Mover's Distance (section 3.2.2, p. 24) performed similarly to
Euclidean distance measurment (section 3.2.2, p. 23). The comparison of SIFT variants can be seen
in figure 4.5 on page
for car images and in figure 4.12 on page
for truck images. We can see that
the SIFT-GradWei with overlapping tiles always performed best but their results are comparable to
each other. The biggest difference is approx. 8% for truck images and 12% for car images. Then we
estimated best
for k-NN algorithm. We found that for truck images the classification rate
decreases with the increasing
but only a little bit (figure 4.14 on page
). For car images the
classification rate decreases more rapidly since we had smaller amount of images for class and from
certain
it does not make sense to classify using that
(figure 4.7 on page
). With FLD we tried
to identify the optimal dimension for the projection of feature space. For both cases it was the
highest possible: 9 for cars and 6 for trucks. We used two classifications in the FLD transformed
feature space: 1-NN and nearest
. We found that nearest
performed better but it was still
worth than k-NN.
The sensitivity in 3 areas was evaluated:
We observed that SIFT is almost invariant to image blur figure 5.3 on page . The results
that the classification rate is not affected can be seen in figure 5.2 on page
for truck images
and in figure 5.8 on page
for car images. The SIFT is not stable for noise addition with higher
values, the orientation distribution becomes random, figure 5.4 on page
, the classification rate
falls down from variance higher 0.1 for truck images (figure 5.5 on page
) and from variance
higher then 0.05 for car images (figure 5.9 on page
). The reason for this difference is that
truck images have higher resolution (1280x512) in comparison to car images (256x92). The sensitivity
of final classifier to training set reduction was good, for the training set reduction to 60% of
its original size we got approx. 14% for car images and approx. 7% for truck images.