Introducing DINOv3: a state-of-the-art computer vision model trained with self-supervised learning (SSL) that produces powerful, high-resolution image features. For the first time, a single frozen vision backbone outperforms specialized solutions on multiple long-standing dense prediction tasks.
Learn more about DINOv3 here: https://ai.meta.com/blog/dinov3-self-supervised-vision-model/?utm_source=twitter&utm_medium=organic_social&utm_content=video&utm_campaign=dinov3
A few highlights of DINOv3 👇
1️⃣SSL enables 1.7B-image, 7B-param training without labels, supporting annotation-scarce scenarios including satellite imagery
2️⃣Produces excellent high-resolution features and state-of-the art performance on dense prediction tasks
3️⃣Diverse application across vision tasks and domains, all with a frozen backbone (no fine-tuning required)
4️⃣ Includes distilled smaller models (ViT-B, ViT-L) and ConvNeXt variants for deployment flexibility
To help foster innovation and collaboration in the computer vision community, we’re releasing DINOv3 under a commercial license with a full suite of pre-trained backbones, adapters, training and evaluation code, and (much!) more.
Find them here: https://t.co/V6KnnE9lUI https://t.co/JsOHsI2fwX

We're excited to share that we have Day-0 support in Hugging Face Transformers for DINOv3 so people can easily leverage the full family of models.
Find out more on @huggingface here: https://t.co/mGANQJZw3J