MaGGIe introduces an efficient framework using Cross-Attention, Self-Attention, and Sparse Convolutions for mask-guided instance matting, ensuring high accuracy and low latency.
We introduce our efficient instance matting framework guided by instance binary masks, structured into two parts. The first Sec. 3.1 details our novel architecture to maintain accuracy and efficiency. The second Sec. 3.2 describes our approach for ensuring temporal consistency across frames in video processing.
3.1. Efficient Masked Guided Instance Matting
In cross-attention (CA), Q and (K, V) originate from different sources, whereas in self-attention (SA), they share similar information.
where {; } denotes concatenation along the feature dimension, and G is a series of sparse convolutions with sigmoid activation.
Authors:
(1) Chuong Huynh, University of Maryland, College Park (chuonghm@cs.umd.edu);
(2) Seoung Wug Oh, Adobe Research (seoh,jolee@adobe.com);
(3) Abhinav Shrivastava, University of Maryland, College Park (abhinav@cs.umd.edu);
(4) Joon-Young Lee, Adobe Research (jolee@adobe.com).
This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.
[story continues]
Written by
@instancing
Pioneering instance management, driving innovative solutions for efficient resource utilization, and enabling a more sus