Table of Links
- Few-Shot Personalized Instance Recognition
- Object-Conditioned Bag of Instances
- Experimental Results
- Conclusion
- References
2. FEW-SHOT PERSONALIZED INSTANCE RECOGNITION
In our setup, we aim at personalizing generic object detection models to recognize objects by a set of instance-level labels.
In the analyses, we implement the model Mo by YOLOv8, pre-train the model on MSCOCO [20] and then on To, which contains samples from Open-Images-V7 (OIV7) [21] of the same object-level classes as in the personal dataset Ti ; i.e., such that f(ยท) exists. We used the default learning parameters [11] for pre-training. We design several setups whereby we assign a few samples to the training set and the remaining ones to the testing (80%) and validation (20%) sets.
Datasets. We use CORe50 [6] or iCubWorld Transformations (iCWT) [22] as the personalized recognition datasets. CORe50: we consider a subset of 45 personal instances (i.e., |Ci | = 45) belonging to 9 object-level classes (i.e., |Co| = 9), acquired over 11 variable-background sequences, i.e., different domains (see Fig. 1). iCWT: we consider a subset of 9 object-level classes with 10 personal instances each acquired under 5 sequences with diverse affine transformations of the items. On both datasets, we restrict the personalization stage to the frames being correctly labelled by YOLOv8n, maintaining a balanced number of samples per instance and per sequence.
3. OBJECT-CONDITIONED BAG OF INSTANCES
We propose a lightweight module that can be integrated into any object detection network. Our solution is based on three key components: (i) an object detection network, with (ii) a multi-order statistical augmentation of embeddings for (iii) instance-level recognition via an OBoI. Next, we outline how we construct our OBoI to personalize object detectors pretrained on To on the server side (see Fig. 2).
The overall pipeline can be thought of as an Object-conditioned Bag of Instances (OBoI) since generic categorylevel output is converted to specific personal-level output via conditional nearest prototype selection. Our setup and method are fully compatible with the key requirement of continually learning new instances over time [6, 5, 16, 24, 25, 26]; whenever a user presents a new instance to be recognized, we can include new instance-level prototypes in the OBoI at any time with no accuracy degradation with respect to the case where all instances are available from the beginning of the adaptation process.
Authors:
(1) Umberto Michieli, Samsung Research UK;
(2) Jijoong Moon, Samsung Research Korea;
(3) Daehyun Kim, Samsung Research Korea;
(4) Mete Ozay, Samsung Research UK.
This paper is