Features extraction

Orientation and gradient assignment

We start with orientation $\phi(x,y)$ and gradient assignment for every pixel in sample image . Let is a pixel intensity at position in image :

$\forall (x,y) \in I:$ $d_y= (I_{x-1,y-1} + 2I_{x,y-1} + I_{x+1,y-1}) - (I_{x-1,y+1} + 2I_{x,y+1} + I_{x+1,y+1})$ $d_x= (I_{x+1,y-1} + 2I_{x+1,y} + I_{x+1,y+1}) - (I_{x-1,y-1} + 2I_{x-1,y} + I_{x-1,y+1})$ $\phi^{*}(x,y) = \arctan d_y/d_x$ $g(x,y) = \sqrt{d^2_y + d^2_x}$

We would like to obtain the same features for image (a) and (b) in figure 3.6 on page . We need this this result since we want to cope with various metal surface colors from dark colors to bright colors. This is the reason we assign to $\phi(x,y)$ the orientation modulo $\pi$ :

$\begin{displaymath}\phi(x,y) = \pi + \phi^*(x,y) \pmod \pi\end{displaymath}$

**Figure 3.6:** obtained orientation is modulo $\pi$ to obtain the same results for (a) and (b)
$\includegraphics[width=80mm,height=40mm]{oripi.eps}$

The local image descriptor representations

The previous orientations have assigned an orientation and gradient for every pixel in the image. The next step is to compute the descriptor which is distinctive yet is as invariant as possible to parameters such as change in illumination, shift in vertical or horizontal direction or viewpoint change. One usual approach would be to sample local image intensities around the keypoint at appropriate scale and to match these using a normalized correlation measure. However, simple correlation of image patches is highly sensitive to changes that cause misregistration of samples, such as affine or 3D viewpoint change.

As we can see on figure 3.7 on page the image is divided into several square patches (tiles). Every square patch is divided into certain number of bins. Every bin represents a certain interval of orientations which are assigned to it. We call this interval a direction. We can number bins from to , where is the number of bins for each tile (i.e. number of directions for each tile). Similarly we can number tiles from to , where is the number of tiles.

**Figure 3.7:** Features Extraction
$\includegraphics[width=80mm,height=40.3mm]{fve.eps}$

How the values are assigned to every bin depends on the SIFT representation. We proposed 3 variants:

SIFT-Ori

This is the easiest representation. Every orientation has the same vote, which equals to one. Given the -th tile we can compute values for every -th bin as:

$\begin{displaymath}\forall (x,y)\in T: u_{ij} = \vert\{(x,y): (x,y) \in T_i \wedge \phi(x,y) \in (j*\pi)/b \}\vert\end{displaymath}$

where ranges from to .

With this approach we can have problems when there is the same probability for the direction in the tile. This can apply for random texture but also for circle-like shapes. If we want to distinguish between these two cases we should use votes based on gradient magnitude (SIFT-Grad).

**Figure 3.8:** Possible distinction between random texture and circle-like shapes
$\includegraphics[width=80mm,height=30mm]{circle_texture.eps}$

SIFT-Grad

This can be more distinctive between the random textures and circle-like shapes (figure 3.8 on page ). Random textures have usually small local gradients. We can use this information to achieve the similar result as displayed in the picture. We do not want the histograms to be the same (as they would be when each direction has the same vote weight). We can choose the vote weight correlated to the gradient magnitude. Given a tile we have:

$\begin{displaymath}u_{ij} = \sum_{\{(x,y): (x,y) \in T_i \wedge \phi(x,y) \in (i*\pi)/b\}}{g(x,y)}\end{displaymath}$

SIFT-GradWei

This representation can deal better with shifts of the tile. Given the -th tile and , is the center of the tile:

$\begin{displaymath}u_{ij} = \sum_{\{(x,y): (x,y) \in T_i \wedge \phi(x,y) \in (i*\pi)/b\}}{{1\over d(x,y)}g(x,y)}\end{displaymath}$

where is computed as:

$\begin{displaymath}d(x,y)=\sqrt{(x-c_x)^2 + (y-c_y)^2}\end{displaymath}$

is the distance of point from the center.

Kocurek 2007-12-17