In general, we can distinguish between two approaches: model based (using stereo vision cameras) and appearance based. We choose the latter one approach.
The classical, model-based approaches to object recognition start from an explicit 3D model of an object's shape. 2D features or primitives such as lines, holes, circular segments, etc. are extracted from the image and matched to the 3D model of the object, taking into account the projection from 3D to 2D. Alternatively, 3D volumetric primitives can be extracted directly from the image to be matched with various models. Often, graphs are used to represent the spatial relationship between features. Recognition in those cases is a question of (sub)graph matching. The main drawback of the model-based approaches is that they firmly rely on the feature extraction, which is often vulnerable, error-prone step. Also, this approach is only feasible for special object classes, composed of primitives that allow easy and robust detection.
While the model-based approaches can be considered as being more object-centered, the appearance-based approaches can be thought of as being more viewer-oriented. Another difference is that, in general, the appearance-based techniques exploit to a far greater extent the available photometric information (sometimes even exclusively) while the model-based approaches focus on the geometric entities present in the image. As a last distinction, one can point out that the appearance-based methods are traditionaly more empirical, while the model-based approaches try to analytically model the relation between 3D object features and their projections in the image[10].