Authors:

(1) Umberto Michieli, Samsung Research UK;

(2) Jijoong Moon, Samsung Research Korea;

(3) Daehyun Kim, Samsung Research Korea;

(4) Mete Ozay, Samsung Research UK.

Abstract and 1. Introduction

  1. Few-Shot Personalized Instance Recognition
  2. Object-Conditioned Bag of Instances
  3. Experimental Results
  4. Conclusion
  5. References

ABSTRACT

Nowadays, users demand for increased personalization of vision systems to localize and identify personal instances of objects (e.g., my dog rather than dog) from a few-shot dataset only. Despite outstanding results of deep networks on classical label-abundant benchmarks (e.g., those of the latest YOLOv8 model for standard object detection), they struggle to maintain within-class variability to represent different instances rather than object categories only. We construct an Object-conditioned Bag of Instances (OBoI) based on multiorder statistics of extracted features, where generic object detection models are extended to search and identify personal instances from the OBoI’s metric space, without need for backpropagation. By relying on multi-order statistics, OBoI achieves consistent superior accuracy in distinguishing different instances. In the results, we achieve 77.1% personal object recognition accuracy in case of 18 personal instances, showing about 12% relative gain over the state of the art.

1. INTRODUCTION

Smart devices are starting to be ubiquitous in everyday life [1] and their users are demanding for instance-level personalized detection of vision systems mounted on such devices [2, 3]. For example, vacuum cleaners can now monitor the behavior of users’ specific pets, and stay away from those specific pets that are mostly scared by the robot’s noise [4]. Nonetheless, users do not provide many labeled examples, being a time-consuming operation. Therefore, we introduce a new task of few-shot instance-level personalization of object detection models to detect and recognize personal instances of objects (e.g., dog1 and dog2 rather than just dog). The limited availability of the data distinguishes our task from previous instance-level personalization attempts [5, 6]. To the best of our knowledge, previous works assume large availability of labelled data and finetune (FT) the models through computationally expensive updates. However, FT-based methods inevitably fail when few-shot samples are provided [7, 8, 9, 10].

In our work, we utilize the latest YOLOv8 [11] efficient detection model, and we enable personalized instance recognition via backpropagation-free Prototypes-based Few-Shot Learners (PFSLs), such as [12, 13]. In short, PFSLs learn a metric space in which classification is performed by computing distances to prototypical representations of each class.

In this context, we extend PFSLs to support object-class conditioned search, and we call these approaches Object-conditioned Bag of Instances (OBoI), since they contain instance-level prototypes. Our approach enriches any OBoI method by augmenting localized encoder embeddings (EEs) of the input object via multi-order statistics to construct a richer metric space, where instance-specific patterns are separable. We compute augmented EEs (AEEs) via a reduction module similar to recent pooling schemes [14, 15, 16, 17] to characterize the distribution of the specific instances from the few-shot labelled data. A concurrent work [14] applies ensemble learning on multi-order features learned separately; however, their focus is neither personalized instance recognition nor object detection, and they require gradient-based training. A backpropagation-free approach, instead, could be especially useful where dynamic compilers are not available for the target hardware. Our OBoIs with AEEs significantly increase model personalization, alleviating neural collapse [18, 19], i.e., a state at which within-class variability of hidden layer outputs is completely lost due to the object-level optimization objective. Our main novelties are:

  1. We propose a novel task of few-shot personalization of object detectors to recognize instances of objects;

  2. We extend PFSLs via object-level conditioning (OBoIs);

  3. We further design a multi-order feature space where personal instances can be separated via a backpropagationfree metric learning on few-shot labelled user data only;

  4. OBoIs provide superior results on both same and other domain data (11-22% and 7-18% relative gains respectively).

This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.