Best k for k-NN

Here we can see how the k influence the classification itself. We can see that with increasing k the classification rate is decreasing (figure 4.7 on page [*]) and that there exist some k for which the classification is the best. This is not necessarily the minimum one as we can see that for SIFTs with worse discrimination power (number of square patches: 6x3, 6x4) the peak is for the $k>1$. For SIFTs with number of square patches 8x4 and 15x9 the peak is for $k=1$ and $k=3$ respectively.


Table 4.4: Best k for k-NN and how this influence the classification rate(see figure 4.7 on page [*])
k Sift Topology: 6x3 6x5 8x4 15x9
1   54.17 70.83 89.58 93.75
3   66.67 83.33 72.92 93.75
5   60.42 81.25 70.83 81.25
7   72.92 75 68.75 85.42
9   70.83 66.67 68.75 70.83
11   66.67 56.25 62.5 58.33
13   60.42 54.17 64.58 56.25
15   56.25 54.17 60.42 52.08
17   43.75 47.92 52.08 37.5
19   29.17 41.67 43.75 25
21   25 43.75 39.58 20.83
23   18.75 33.33 31.25 20.83
25   20.83 29.17 29.17 22.92
27   20.83 31.25 25 20.83


Figure 4.7: The best k for k-NN algorithm
\includegraphics[width=85mm,height=70mm]{kForK-NN.eps}

Let us show what is happening on the picture figure 4.8 on page [*]. Let $C$ is a test car and $C_1,\dots,C_N$ are car classes from the learning set. The classification error for $(C,C_1)$ is $e_1$, for $(C,C_i)$ is $e_i$. Leat us assume there is an $E,E>0$ which denotes a threshold for which the assignment between $(C,C_i)=E$ is pointless. In another words $C$ has nothing to do with $C_k$. If we choose some $k$ we are trying to find a class which has a majority between $C_1,\dots,C_k$ observations. The problem is that if from some $j, j<k$ the error $e_j, e_j>E$ exists we are taking into account car class $C_j,\dots,C_k$ which have noting to do with $C$.

The graph on figure 4.7 on page [*] is showing the classification is going worse from $k \ge 5$. The problem was if we had e.g. 1x Fiat Uno with 2 total occurences in class and 4x Skoda Felicie with total number of occurences $>8$. Therefore the relative majority is for Fiat Uno since it has relative occurence of 0.5 whereas Felicie would have at most $4/9<0.5$.

Figure: The explanation to figure 4.7 on page [*]
\includegraphics[width=80mm,height=23mm]{kNN-dec.eps}

Kocurek 2007-12-17