Applications for automatic identification of objects in scene images, called auto-tagging, range from image database organization over visual search to e-commerce. In this paper our effort to identify specific pieces of furniture in cluttered scenes based on ideal models using object recognition techniques are discussed. The extraction of local features and naive nearest neighbor matching followed by a geometric verification are compared to the Bag of Words (BoW) and the Spatial Pyramid Matching Kernel (SPMK) approach. As expected the naive matching of local features generated the worst results whereas the SPMK method yields the best results and was therefore closer examined. We propose a spatial based, additional layer weights of weight to be added to SPMK for this specific application. Our results still not satisfy the desired use case, reliable recognition of furniture, but can be used as lead for further research.