Active Discovery And Imblanced Learning in Generative-Discriminative models

The vast majority of sky survey image content is due to well understood phenomena, and only 0.001% of data is of interest for astronomers to study. Computer network intrusion detection exhibits vast amounts of normal user traffic, and a very few examples of malicious attacks. In computer vision based security surveillance of public spaces, observed activities are almost always everyday behaviours, but very rarely there may be a dangerous or malicious activity of interest. All of these classification problems share two interesting properties: highly unbalanced proportions – the vast majority of data occurs in one or more background classes, while the instances of interest for detection are much rarer; and unbalanced prior knowledge – the majority classes are typically known a priori, while the rare classes are not. In order to discover and learn to classify the interesting rare classes, exhaustive labeling of a large dataset would be required to ensure coverage and sufficient representation of all rare classes. This is often prohibitively expensive as generating each label may require significant time from a human expert. Evaluation on a batch of vision and UCI datasets covering various domains and complexities, shows that our approach consistently and often significantly outperforms existing methods at the important task of simultaneous discovery and classification of rare classes. A priori undiscovered classes based on adapting two query criteria and choosing classifiers. To switch generative and discriminative classifiers we used a multi-class generalization of unsupervised classification entropy.