This story on HackerNoon has a decentralized backup on Sia.
Transaction ID: u4FLLSq3u6u46Zuv9xCjhJh4e9dkZTmQZIkPHPWtTkk
Cover

Netflix’s Void-Model Removes Video Objects Without Breaking Physics

Written by @aimodels44 | Published on 2026/4/10

TL;DR
Void-Model removes objects from video while preserving physical interactions, making edits look natural instead of visually broken.

Model overview

void-model removes objects from videos while preserving the physical interactions those objects create with their environment. Unlike simpler removal tools that only erase the object itself, this model understands that removing a person means objects they were holding should fall, or that removing a support structure leaves dependent items displaced. Built by Netflix, it fine-tunes CogVideoX-5b with interaction-aware conditioning using quadmasks that encode what to remove, overlapping regions, affected areas, and what to keep. This represents a significant step beyond background removal tools that handle semantic segmentation without physics simulation.

Model inputs and outputs

The model takes a source video, a specialized four-value mask, and a text description of the desired scene. It outputs a new video with the specified object removed and all physical consequences of that removal rendered naturally.

Inputs

  • Source video in MP4 format at any resolution
  • Quadmask video encoding four regions: primary object to remove (value 0), overlap regions (value 63), affected regions where objects fall or shift (value 127), and background to keep (value 255)
  • Text prompt describing the scene after removal

Outputs

  • Inpainted video with the object removed and physics-aware scene changes applied
  • Support for up to 197 frames at 384x672 resolution

Capabilities

The model handles counterfactual video generation by understanding object interactions. When you remove a person holding a coffee cup, the cup falls naturally. When you remove a table, objects that were on it shift appropriately. It achieves temporal consistency across longer clips through optional two-pass refinement, where the first pass handles primary inpainting and the second applies optical flow-warped latent initialization. The architecture uses 3D transformers to maintain coherence across frames while managing memory efficiently with BF16 precision and FP8 quantization.

What can I use it for?

Content creators can use this for professional video editing without extensive manual retouching. Film production can remove unwanted objects from scenes while maintaining physical realism. Advertising teams can generate counterfactual scenarios showing how spaces would look with different elements. Research applications include studying how objects interact in physics simulations and generating synthetic datasets for machine learning. The model supports commercial use cases where removing elements while preserving scene coherence creates significant production value.

Things to try

Test the model on videos with clear object interactions like someone holding items, sitting on furniture, or blocking pathways. Start with shorter clips to establish baseline quality before attempting the full 197-frame capacity. Experiment with different text prompts describing the final scene to guide the inpainting process. The two-pass approach works best on longer sequences where temporal consistency becomes critical, so compare single-pass and dual-pass outputs to see the refinement benefit. Try edge cases like removing support structures or objects that occlude large portions of the frame to understand where the physics-aware conditioning shows its advantages over simpler inpainting methods.


This is a simplified guide to an AI model called void-model maintained by netflix. If you like these kinds of analysis, join AIModels.fyi or follow us on Twitter.

[story continues]


Written by
@aimodels44
Among other things, launching AIModels.fyi ... Find the right AI model for your project - https://aimodels.fyi

Topics and
tags
artificial-intelligence|software-architecture|void-model|void-model-by-netflix|netflix-ai-model|video-object-removal|physics-aware-editing|video-inpainting
This story on HackerNoon has a decentralized backup on Sia.
Transaction ID: u4FLLSq3u6u46Zuv9xCjhJh4e9dkZTmQZIkPHPWtTkk