Physical AI
Develop world foundation models to advance physical AI.
Overview
NVIDIA Cosmos™ is a platform of state-of-the-art generative world foundation models (WFMs), advanced tokenizers, guardrails, and an accelerated data processing and curation pipeline. It is built to power world model training and accelerate physical AI development for autonomous vehicles (AVs) and robots.
Cosmos provides developers easy access to high-performance world foundation models, data pipelines, and tools to post-train these models for robotics and autonomous driving tasks.
World foundation models are pre-trained on 20 million hours of robotics and driving data to generate world states grounded in physics.
Cosmos WFMs, guardrails, and tokenizers are licensed under the NVIDIA Open Model License, allowing access to all physical AI developers.
Models
A family of pretrained multimodal models that developers can use out-of-the-box for world generation and reasoning, or post-train to develop specialized physical AI models.
Generalist model for world generation and motion prediction from multimodal input. Trained on 9,000T tokens of robotics and driving data and purpose-built for post-training.
Available as Cosmos NIM for accelerated inference anywhere.
Physics-aware world generation conditioned on ground-truth and 3D inputs. Input includes segmentation maps, depth signals, LiDAR scans, key points, trajectories, HD maps, and ground-truth simulation from NVIDIA Omniverse™ for controllable synthetic data generation.
Fully customizable, multimodal reasoning model for planning response based on spatial and temporal understanding.
Trained using visual-language model fine-tuning and reinforcement learning for chain-of-thoughts reasoning.
Develop responsible models using Cosmos WFM with pre-guard for filtering unsafe input and post-guard for consistent and safe outputs.
Cosmos provides developers with open and highly performant data curation pipelines, tokenizers, training framework and post-training scripts to quickly and easily build specialized world models like policy models and visual language action (VLA) models for embodied AI.
Developers post-train Cosmos WFMs or couple with NVIDIA Omniverse to drive downstream physical AI use cases.
Cosmos accelerates synthetic data generation to train perception AI models.
Omniverse provides generative APIs, tools, and NVIDIA RTX™ rendering to create physically accurate ground-truth 3D scenes for Cosmos WFM. Using these visuals as inputs, Cosmos Transfer WFM generates photorealistic outputs—simulating diverse weather, environments, and lighting—while predicting world states with physical accuracy, based on text prompts.
Developers can use generalist Cosmos WFMs out of the box or customize them with their own data for greater precision in downstream SDG.
Cosmos models, guardrails, and tokenizers are available on Hugging Face and GitHub, with resources to tackle data scarcity in training physical AI models. We are committed to driving Cosmos forward— transparent, open, and built for all.
Model developers from robotics, autonomous vehicles, and vision AI industries are using Cosmos to accelerate physical AI development.