๐ My first tweet!
(1/n) Thrilled to share our new work: Context-as-Memory (CaM) โ tackling the memory problem in Video World Model!
Our idea: context=memory. By leveraging context, CaM preserves consistency across generations (like Genie 3).
๐ฅ Check out our demo video below! https://t.co/M7G34GfJNy
(2/n) Unlike works treating 3D as memory, we see it more simply:
๐ memory = past generated context frames.
With full context, models can generate consistent scenes.
But referencing all history is computationally heavyโso CaM does memory retrieval to pick only the useful parts. https://t.co/kk8Z6fnvNR

(3/n) So how does memory retrieval work? ๐ค
CaM supports camera trajectory control โ each frame has a pose. We compute FOV overlap between past & future frames, and only select those with high overlap as context.
This keeps computation efficient while preserving consistency. https://t.co/p3LffWJxec

(4/n) For training, we collected long videos across scenes using Unreal Engine 5.
We also explored open-domain internet images for world modelsโand surprisingly strong results emerged, even with a small 1B model.
More results ๐
Project Page: https://t.co/50Tigs1adK https://t.co/QWuvS9KdeR
(5/n) More of our work in Video Gen & World Models ๐ฎโจ
GameFactory (ICCVโ25 Highlight): from Minecraft data to open-domain infinite game worlds.
๐ https://t.co/U5nWilryUS
Position + Survey Papers about video world models.
๐ https://t.co/YwADPzxusf
๐ https://t.co/GMgmeizJtu https://t.co/WRJFzRW3bG

