Improving Flexibility and Performance in Metric-Based Few-Shot Classification
Neural image classifiers have surpassed human performance and attained widespread usage, but rely crucially on access to hundreds if not thousands of labeled images for each category of interest. This assumed high level of image availability is not always realistic. Classes of interest might be rare, or require expensive expert annotation. These concerns have fueled interest in few-shot classification, or the ability to classify novel object types using very few (only one to five) reference images per class. Performance on few-shot classification benchmarks has since seen steady improvement and the field remains a healthy area of research. Unfortunately, the standard few-shot classification benchmarks also exhibit unrealistic assumptions. It is frequently assumed that salient objects are nicely centered and cropped, and that classes are visually distinct. At deployment, it is assumed that only five relevant classes will be present at any given time, that exactly one or five reference images will be available per class, and that the practitioner will always know which one of these it will be in advance. In real world conditions, any or all of these assumptions may break; practical few-shot classification will require sufficient power and flexibility to handle these scenarios. In this work, we show that existing few-shot classifiers underperform when these assumptions are broken. However, through three novel techniques, we can restore and further improve model performance. We provide proof of concept using a novel few-shot classification benchmark more closely reflecting real-world conditions, then investigate each technique in further detail. First, we incorporate inexpensive location annotations during training, to better isolate regions of interest when relevant objects are not scaled, centered or viewer-oriented. Second, we leverage relationships among and between image components during classification to produce more powerful classifiers when classes are visually similar. Third, we reformulate the few-shot training process to handle greater levels of reference image availability; this facilitates a large scale-study on the effect of availability upon classifier performance. We find that a simple alteration to built-in distance metrics restores consistent performance when reference image availability does not match training conditions: we no longer need to know the degree of availability in advance. Together, these findings improve the flexibility and power of few-shot classifiers, and establish a valuable starting point for deployment in messy, real-world conditions.