Table of Links
- Abstract and Introduction
- Backgrounds
- Type of remote sensing sensor data
- Benchmark remote sensing datasets for evaluating learning models
- Evaluation metrics for few-shot remote sensing
- Recent few-shot learning techniques in remote sensing
- Few-shot based object detection and segmentation in remote sensing
- Discussions
- Numerical experimentation of few-shot classification on UAV-based dataset
- Explainable AI (XAI) in Remote Sensing
- Conclusions and Future Directions
- Acknowledgements, Declarations, and References
9 Numerical experimentation of few-shot classification on UAV-based dataset
In order to address point 4 in the discussion section, a few-shot state-ofthe-art (SOTA) method were employed to classify disaster scenes using the publicly available AIDER subset dataset. The evaluation involved the use of several few-shot methods such as the Siamese and Triplet Network, ProtoNet [15], Relation Network [81], Matching Network [129], SimpleShot [130], TAskDependent Adaptive Metric (TADAM) [131], MAML [125], Meta-Transfer Learning (MTL) [132], and Label Hallucination [133], which were originally proposed and evaluated in non-remote sensing datasets. The aim of the study was to evaluate the effectiveness of these methods in the remote sensing setting. To compare the results obtained from such dataset against that of a satellite-based remote sensing image classification, we compared our findings with some of the methods utilized in the UC-Merced evaluation as performed by [69]; For the methods not listed there, we performed the simulation using the experimental condition as stipulated by [69].
We conducted a 5-way 1-shot and 5-way 5-shot classification evaluation approach. The AIDER subset dataset consists of a total of 6433 images, classified into 5 categories, namely collapsed buildings, fires, floods, traffic accidents, and normal (non-disaster) classes, with 511, 521, 526, 485, and the rest of the images, respectively. The dataset subset is imbalanced, with more images in the normal class than in the other disaster classes, highlighting the potential benefits of few-shot learning approaches, as mentioned in previous sections. Table 8 depicts the train-valid-test split ratio adapted for each class. All images are cropped to a pixel size of 224 × 224 and pre-processed by dividing each original pixel value by 255. The learning rate for each algorithm is set as 0.001. ResNet12 is chosen as the feature extraction backbone for TADAM, ProtoNet, Matching Network, Relation Network, SimpleShot, MTL, and Label Hallucination. For all of the methods, a common categorical cross-entropy loss is utilized, except for relation network, which utilized a mean-squared error loss. To tackle the problem of class imbalance, the training and validation samples were subjected to under-sampling by utilizing the RandomUnderSampler module, which was provided in the imblearn library package. All the simulations in this dataset were carried out for a total of 200 epochs using the Tensorflow Keras library in Python, and the Google Colab Pro+ platform with Tesla A100, V100, and T4 Graphical Processing Units (GPU) and Tensor Processing Units (TPU) were employed for computation.
For the UC-Merced dataset, apart from the features as mentioned in section 4.2, 10 classes are utilized as the base training set, 5 classes are set aside as the validation set, and the remaining 6 classes are utilized as the novel test set. In line with [69], the shapes of all images are cropped to 84 × 84 for feature extraction using their proposed feature encoder, with the momentum factor set to 0.1, and the learning rate set to 0.001. Due to all classes in UC-Merced having equal samples per class, methods to handle class imbalance are not
needed. Once again, the common categorical cross-entropy is utilized as the loss function, except for relation network, which utilized a mean-squared error loss.
Table 9 presents the results of the simulations carried out on the AIDER subset using the few-shot evaluation approach mentioned earlier. Table 10 the corresponding results on the UC-Merced dataset. The mean accuracy and standard deviation for 10 runs are reported for each method. For the Siamese and Triplet network, the results are only reported for the 5-way 1-shot evaluation, as only 1 pair of images is compared per episodic training (for the Triplet network, the anchor image is taken and compared with the positive vs the negative image each at a time, so 1 pair of images are still considered). It was observed that the mean accuracy for the 5-way 5-shot approach is generally higher than that of the 5-way 1-shot approach for all the methods utilized in the two dataset, in agreement with the statement made earlier about the difficulty of few-shot learning with fewer shots. The Siamese network was found to outperform both Triplet and ProtoNet, demonstrating its effectiveness in feature extraction and embedding. Consistent with the trend observed in a previous study, MTL outperformed TADAM and ProtoNet in the AIDER and the UC-Merced subset, while label hallucination yielded the highest performance with a metric value of over 81% in the AIDER subset.
Authors:
(1) Gao Yu Lee, School of Electrical and Electronic Engineering, Nanyang Technological University, 50 Nanyang Ave, 639798, Singapore ([email protected]);
(2) Tanmoy Dam, School of Mechanical and Aerospace Engineering, Nanyang Technological University, 65 Nanyang Drive, 637460, Singapore and Department of Computer Science, The University of New Orleans, New Orleans, 2000 Lakeshore Drive, LA 70148, USA ([email protected]);
(3) Md Meftahul Ferdaus, School of Electrical and Electronic Engineering, Nanyang Technological University, 50 Nanyang Ave, 639798, Singapore ([email protected]);
(4) Daniel Puiu Poenar, School of Electrical and Electronic Engineering, Nanyang Technological University, 50 Nanyang Ave, 639798, Singapore ([email protected]);
(5) Vu N. Duong, School of Mechanical and Aerospace Engineering, Nanyang Technological University, 65 Nanyang Drive, 637460, Singapore ([email protected]).
This paper is