Generative AI—The Next Frontier: Video World Foundations Models
, Distinguished Software Engineer, NVIDIA
To advance generative AI beyond text-based large language models to video comprehension, the industry and its research must focus on meticulous data curation, filtering, annotation, and high-quality data selection from millions of videos for downstream fine-tuning tasks. These tasks span practical implementations in fields like robotics, digital twins, and autonomous vehicles. This talk will highlight innovations and research across the entire AI lifecycle stack for building a multimodal video foundation model. Key insights from NVIDIA’s team, tasked with creating this model, will be shared—covering aspects that enable developers and experts to leverage and enhance state-of-the-art AI models for large-scale inference and the creation of high-quality video datasets. (bulleted list omitted due to character count and keeping consistent with other descriptions)