This story on HackerNoon has a decentralized backup on Sia.
Transaction ID: 9UyUS9GQaw2rWzpUpnoyHL08RQf9lQXJWThc-L8FejU
Cover

FlowVid: Taming Imperfect Optical Flows: Generation: Edit the First Frame Then Propagate

Written by @kinetograph | Published on 2024/10/9

TL;DR
This paper proposes a consistent V2V synthesis framework by jointly leveraging spatial conditions and temporal optical flow clues within the source video.

(1) Feng Liang, The University of Texas at Austin and Work partially done during an internship at Meta GenAI (Email: jeffliang@utexas.edu);

(2) Bichen Wu, Meta GenAI and Corresponding author;

(3) Jialiang Wang, Meta GenAI;

(4) Licheng Yu, Meta GenAI;

(5) Kunpeng Li, Meta GenAI;

(6) Yinan Zhao, Meta GenAI;

(7) Ishan Misra, Meta GenAI;

(8) Jia-Bin Huang, Meta GenAI;

(9) Peizhao Zhang, Meta GenAI (Email: stzpz@meta.com);

(10) Peter Vajda, Meta GenAI (Email: vajdap@meta.com);

(11) Diana Marculescu, The University of Texas at Austin (Email: dianam@utexas.edu).

4.3. Generation: edit the first frame then propagate

Another advantageous strategy we discovered is the integration of self-attention features from DDIM inversion, a technique also employed in works like FateZero [35] and TokenFlow [13]. This integration helps preserve the original structure and motion in the input video. Concretely, we use DDIM inversion to invert the input video with the original prompt and save the intermediate self-attention maps at various timesteps, usually 20. During the generation with the target prompt, we substitute the keys and values in the selfattention modules with these pre-stored maps. Then, during the generation process guided by the target prompt, we replace the keys and values within the self-attention modules with previously saved corresponding maps.

This paper is available on arxiv under CC 4.0 license.

[story continues]


Written by
@kinetograph
The Kinetograph's the 1st motion-picture camera. At Kinetograph.Tech, we cover cutting edge tech for video editing.

Topics and
tags
diffusion-models|image-to-image-synthesis|video-to-video-synthesis|temporal-consistency|v2v-synthesis-framework|spatial-conditions|temporal-optical-flow|flowvid
This story on HackerNoon has a decentralized backup on Sia.
Transaction ID: 9UyUS9GQaw2rWzpUpnoyHL08RQf9lQXJWThc-L8FejU