Learning with attributes as label embedding

The image classification problem: given an image, we wish to annotate it with one (or multiple) class label(s) describing its visual content. Image classification is a prediction task where the goal is to learn from labeled data a function : The maps an input x in the space of images to an output in the space of class labels . Especially interested in the case where we have no (positive) labeled samples for some of the classes and still wish to make a prediction. This problem is generally referred to as zero-shot learning . A solution to zero-shot learning which has recently gained in popularity in the computer vision community consists in introducing an intermediate space A referred to as attribute layer. Attributes correspond to high-level properties of the objects which are shared across multiple classes, which can be detected by machines and which can be understood by humans. The classes correspond to animals, possible attributes include “has paws”, “has stripes” or “is black”. The traditional attribute-based prediction algorithm requires learning one classifier per attribute. To classify a new image, its attributes are predicted using the learned classifiers and the attribute scores are combined into class-level scores. This two-step strategy is referred to as Direct Attribute Prediction (DAP) to leverage attributes to compute label embeddings. The model parameters, the label embedding framework is generic enough to accommodate for other sources of side information. These associations may be binary or real-valued if we have information about the association strength. In this work, we focus on binary relevance although one advantage of the label embedding framework is that it can easily accommodate real-valued relevances.