Cosmos benchmarks are designed to evaluate the next generation of world models with advanced criteria like 3D consistency and physics alignment, essential for robotics and autonomous systems.
Compared to VideoLDM (VLDM), a baseline generative model for video synthesis, Cosmos WFMs excel in geometric accuracy with lower Sampson error and better temporal stability. Benchmarks also evaluate WFMs based on physical behaviors like gravity and collision dynamics.
Cosmos WFMs consistently outperform VLDM on visual consistency, achieving up to 14X higher pose estimation success rates. While diffusion models deliver higher fidelity out of the box, autoregressive models deliver excellent performance for custom models.