sia.hackernoon.com

Table of Links

Related Work
Method

3.1 Overview of Our Method

3.2 Coarse Text-cell Retrieval

3.3 Fine Position Estimation

3.4 Training Objectives
Experiments

4.1 Dataset Description and 4.2 Implementation Details

4.3 Evaluation Criteria and 4.4 Results
Performance Analysis

5.1 Ablation Study

5.2 Qualitative Analysis

5.3 Text Embedding Analysis
Conclusion and References

Supplementary Material

Anonymous Authors

3.3 Fine Position Estimation

Following the coarse stage, we aim to refine the location prediction within the retrieved cells. Based on the matching-free network [42], we introduce the query instance extractor to mitigate the dependency on ground-truth instances as input. Moreover, we propose a relative position-aware cross-attention module to incorporate spatial relation information in the position estimation process.

Relative position-aware cross-attention (RPCA). Within the multi-modal fusion module, we introduce the RPCA to integrate spatial relation information with the text-cell features, as shown in Fig. 5. The multi-modal fusion module consists of two crossattention modules and one RPCA module. The two cross-attention modules are configured with the text feature serving as the query and the point cloud feature as both key and value. The RPCA takes the point cloud feature as query and the text feature as key and value. To prepare the point cloud feature for RPCA, it first passes through a RowColRPA module, which crafts the query for RPCA and also generates the key and value (k1, v1) for the first cross-attention. Concurrently, the text feature undergoes two linear layers to get the key and value. Following the cross-attention operation, an additional RowColRPA is applied to create the key and value (k2, v2) for the second cross-attention. This design incorporates spatial relation features within the multi-modal fusion process.

3.4 Training Objectives

Authors:

(1) Lichao Wang, FNii, CUHKSZ ([email protected]);

(2) Zhihao Yuan, FNii and SSE, CUHKSZ ([email protected]);

(3) Jinke Ren, FNii and SSE, CUHKSZ ([email protected]);

(4) Shuguang Cui, SSE and FNii, CUHKSZ ([email protected]);

(5) Zhen Li, a Corresponding Author from SSE and FNii, CUHKSZ ([email protected]).

This paper is available on arxiv under CC BY-NC-ND 4.0 Deed (Attribution-Noncommercial-Noderivs 4.0 International) license.

How RPCA Improves Location Prediction Accuracy

Table of Links

3.3 Fine Position Estimation

3.4 Training Objectives