This story on HackerNoon has a decentralized backup on Sia.
Transaction ID: NPZ0nA940hCCJao1cglbjdonMlXykxeTyY57fv4rat8
Cover

How RPCA Improves Location Prediction Accuracy

Written by @instancing | Published on 2025/7/15

TL;DR
This section introduces a fine-grained location prediction method that leverages a novel Relative Position-Aware Cross-Attention (RPCA) module. By combining point cloud and text features through multi-stage attention mechanisms, the model improves spatial awareness and reduces reliance on labeled ground truth. The design enables more accurate and robust multi-modal AI systems.

Abstract and 1. Introduction

  1. Related Work

  2. Method

    3.1 Overview of Our Method

    3.2 Coarse Text-cell Retrieval

    3.3 Fine Position Estimation

    3.4 Training Objectives

  3. Experiments

    4.1 Dataset Description and 4.2 Implementation Details

    4.3 Evaluation Criteria and 4.4 Results

  4. Performance Analysis

    5.1 Ablation Study

    5.2 Qualitative Analysis

    5.3 Text Embedding Analysis

  5. Conclusion and References

Supplementary Material

  1. Details of KITTI360Pose Dataset
  2. More Experiments on the Instance Query Extractor
  3. Text-Cell Embedding Space Analysis
  4. More Visualization Results
  5. Point Cloud Robustness Analysis

Anonymous Authors

  1. Details of KITTI360Pose Dataset
  2. More Experiments on the Instance Query Extractor
  3. Text-Cell Embedding Space Analysis
  4. More Visualization Results
  5. Point Cloud Robustness Analysis

3.3 Fine Position Estimation

Following the coarse stage, we aim to refine the location prediction within the retrieved cells. Based on the matching-free network [42], we introduce the query instance extractor to mitigate the dependency on ground-truth instances as input. Moreover, we propose a relative position-aware cross-attention module to incorporate spatial relation information in the position estimation process.

Figure 5: Illustration of the relative position-aware multimodal fusion module. The relative-position-aware cross attention (RPCA) merges potential instance features with text keys and values, infusing semantic and spatial relation information with text embeddings.

Relative position-aware cross-attention (RPCA). Within the multi-modal fusion module, we introduce the RPCA to integrate spatial relation information with the text-cell features, as shown in Fig. 5. The multi-modal fusion module consists of two crossattention modules and one RPCA module. The two cross-attention modules are configured with the text feature serving as the query and the point cloud feature as both key and value. The RPCA takes the point cloud feature as query and the text feature as key and value. To prepare the point cloud feature for RPCA, it first passes through a RowColRPA module, which crafts the query for RPCA and also generates the key and value (k1, v1) for the first cross-attention. Concurrently, the text feature undergoes two linear layers to get the key and value. Following the cross-attention operation, an additional RowColRPA is applied to create the key and value (k2, v2) for the second cross-attention. This design incorporates spatial relation features within the multi-modal fusion process.

3.4 Training Objectives

Authors:

(1) Lichao Wang, FNii, CUHKSZ (wanglichao1999@outlook.com);

(2) Zhihao Yuan, FNii and SSE, CUHKSZ (zhihaoyuan@link.cuhk.edu.cn);

(3) Jinke Ren, FNii and SSE, CUHKSZ (jinkeren@cuhk.edu.cn);

(4) Shuguang Cui, SSE and FNii, CUHKSZ (shuguangcui@cuhk.edu.cn);

(5) Zhen Li, a Corresponding Author from SSE and FNii, CUHKSZ (lizhen@cuhk.edu.cn).


This paper is available on arxiv under CC BY-NC-ND 4.0 Deed (Attribution-Noncommercial-Noderivs 4.0 International) license.

[story continues]


Written by
@instancing
Pioneering instance management, driving innovative solutions for efficient resource utilization, and enabling a more sus

Topics and
tags
cross-modal-ai|human-robot-interaction|3d-point-cloud-navigation|spatial-language-grounding|instance-free-localization|ai-in-robotics|vision-language-models
This story on HackerNoon has a decentralized backup on Sia.
Transaction ID: NPZ0nA940hCCJao1cglbjdonMlXykxeTyY57fv4rat8