🚀 My first tweet! (1/n) Thrilled to share our new

🚀 My first tweet!
(1/n) Thrilled to share our new work: Context-as-Memory (CaM) — tackling the memory problem in Video World Model!

Our idea: context=memory. By leveraging context, CaM preserves consistency across generations (like Genie 3).

🎥 Check out our demo video below! https://t.co/M7G34GfJNy

(2/n) Unlike works treating 3D as memory, we see it more simply:
👉 memory = past generated context frames.

With full context, models can generate consistent scenes.
But referencing all history is computationally heavy—so CaM does memory retrieval to pick only the useful parts. https://t.co/kk8Z6fnvNR

(3/n) So how does memory retrieval work? 🤔
CaM supports camera trajectory control — each frame has a pose.  We compute FOV overlap between past & future frames, and only select those with high overlap as context.
This keeps computation efficient while preserving consistency. https://t.co/p3LffWJxec

(4/n) For training, we collected long videos across scenes using Unreal Engine 5.

We also explored open-domain internet images for world models—and surprisingly strong results emerged, even with a small 1B model.

More results 👉
Project Page:  https://t.co/50Tigs1adK https://t.co/QWuvS9KdeR

(5/n) More of our work in Video Gen & World Models 🎮✨

GameFactory (ICCV’25 Highlight): from Minecraft data to open-domain infinite game worlds.
🔗 https://t.co/U5nWilryUS
Position + Survey Papers about video world models.
🔗 https://t.co/YwADPzxusf
🔗 https://t.co/GMgmeizJtu https://t.co/WRJFzRW3bG

Jiwen Yu