Abstract and 1. Introduction

  1. Related Work

    2.1. Motion Reconstruction from Sparse Input

    2.2. Human Motion Generation

  2. SAGE: Stratified Avatar Generation and 3.1. Problem Statement and Notation

    3.2. Disentangled Motion Representation

    3.3. Stratified Motion Diffusion

    3.4. Implementation Details

  3. Experiments and Evaluation Metrics

    4.1. Dataset and Evaluation Metrics

    4.2. Quantitative and Qualitative Results

    4.3. Ablation Study

  4. Conclusion and References

Supplementary Material

A. Extra Ablation Studies

B. Implementation Details

In this supplementary material, we provide additional ablation on our design choice of the SAGE Net and implementation specifics.

A. Extra Ablation Studies

A.1 Input sequence length

Our model adheres to the online inference setting, where it processes sparse tracking signals from the past N frames and predicts the full body motion of the final frame as done in [18, 54]. As indicated in [11, 18, 54], the length of the input sequence is a critical factor affecting the model’s performance, involving a balance between efficiency and effectiveness. Therefore, it is essential for our model to effectively tackle shorter sequences, as this not only maintains performance but also significantly reduces computational costs.

We examine AvatarJLM [54] and our method with different input lengths N under setting S1, as presented in Tab. A. The results demonstrate that our proposed SAGE Net is more robust to variations in the input sequence length compared to the baseline method, AvatarJLM [54]. Notably, SAGE Net is able to exceed AvatarJLM’s performance even when utilizing just a quarter of their sequence length (10 frames for our method compared to 40 frames for AvatarJLM).

A.2 Predicting noise

Authors:

(1) Han Feng, equal contributions, ordered by alphabet from Wuhan University;

(2) Wenchao Ma, equal contributions, ordered by alphabet from Pennsylvania State University;

(3) Quankai Gao, University of Southern California;

(4) Xianwei Zheng, Wuhan University;

(5) Nan Xue, Ant Group ([email protected]);

(6) Huijuan Xu, Pennsylvania State University.


This paper is available on arxiv under CC BY 4.0 DEED license.