NVIDIA at CVPR 2024

Seattle Convention Center
Seattle, WA

June 17–21

At the Computer Vision and Pattern Recognition (CVPR) conference, NVIDIA researchers shared their latest groundbreaking innovations—including forty-eight papers. Explore the work to see how NVIDIA Research collaborates with CVPR members to deliver AI breakthroughs across the community.

GIF of 3D generated assets in a scene.

NVIDIA Edify 3D Sweepstakes

Researchers, developers, and enthusiasts are invited to try out the NVIDIA Edify 3D model for a chance to win an NVIDIA GeForce RTX™ 3090 Ti.

Autonomous Driving Solutions

The NVIDIA DRIVE® team is constantly innovating, developing end-to-end autonomous driving solutions that are transforming the industry.

Like No Place You’ve Ever Worked

Working at NVIDIA, you’ll solve some of the world’s hardest problems and discover never-before-seen ways to improve the quality of life for people everywhere. From healthcare to robots, self-driving cars to blockbuster movies, you’ll experience it all. Plus, there’s a growing list of new opportunities every single day. Explore all of our open roles, including internships and new college graduate positions.

Learn more about our current job openings, as well as university jobs.

NVIDIA Research Papers at CVPR 2024

NVIDIA’s accepted papers at CVPR 2024 feature a range of groundbreaking research in the field of computer vision. From human motion forecasting to extracting triangular 3D models, materials, and lighting from images, explore the work NVIDIA is bringing to the CVPR community. 

*Denotes equal contribution to the paper.

Neural Implicit Representation for Building Digital Twins of Unknown Articulated Objects

Yijia Weng (Stanford University) · Bowen Wen (NVIDIA) · Jonathan Tremblay (NVIDIA) · Valts Blukis (NVIDIA) · Dieter Fox (University of Washington) · Leonidas Guibas (Stanford University) · Stan Birchfield (NVIDIA) | Paper

Align Your Gaussians: Text-to-4D With Dynamic 3D Gaussians and Composed Diffusion Models

Huan Ling (NVIDIA, University of Toronto) · Seung Wook Kim (NVIDIA) · Antonio Torralba (MIT) · Sanja Fidler (University of Toronto) · Karsten Kreis (NVIDIA) | Paper

FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects

Bowen Wen (NVIDIA) · Wei Yang (NVIDIA) · Jan Kautz (NVIDIA) · Stan Birchfield (NVIDIA) | Paper

Rendering Every Pixel for High-Fidelity Geometry in 3D GANs

Alex Trevithick (None) · Matthew Chan (NVIDIA) · Towaki Takikawa (NVIDIA) · Umar Iqbal (None) · Shalini De Mello (NVIDIA Research) · Manmohan Chandraker (University of California, San Diego) · Ravi Ramamoorthi (None) · Koki Nagano (None) | Paper

VILA: On Pretraining for Visual Language Models

Ji Lin (MIT) · Danny Yin (NVIDIA) · Wei Ping (NVIDIA) · Pavlo Molchanov (NVIDIA) · Mohammad Shoeybi (NVIDIA) · Song Han (MIT) | Paper

GAvatar: Animatable 3D Gaussian Avatars With Implicit Mesh Learning

Ye Yuan (NVIDIA Research) · Xueting Li (NVIDIA) · Yangyi Huang (Zhejiang University) · Shalini De Mello (NVIDIA Research) · Koki Nagano (None) · Jan Kautz (NVIDIA) · Umar Iqbal (None) | Paper

JDEC: JPEG Decoding via Enhanced Continuous Cosine Coefficients

Woo Kyoung Han (Korea University) · Sunghoon Im (Daegu Gyeongbuk Institute of Science and Technology) · Jaedeok Kim (NVIDIA) · Kyong Hwan Jin (Korea University) | Paper

Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation

Ji-Jia Wu (National Taiwan University) · Andy Chia-Hao Chang (National Yang Ming Chiao Tung University) · Chieh-Yu Chuang (National Yang Ming Chiao Tung University) · Chun-Pei Chen (National Yang Ming Chiao Tung University) · Yu-Lun Liu (National Yang Ming Chiao Tung University) · Min-Hung Chen (NVIDIA) · Hou-Ning Hu (MediaTek Inc.) · Yung-Yu Chuang (National Taiwan University) · Yen-Yu Lin (National Yang Ming Chiao Tung University) | Paper

HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data

Mengqi Zhang (University of California, San Diego) · Yang Fu (University of California, San Diego) · Zheng Ding (University of California, San Diego) · Sifei Liu (NVIDIA) · Zhuowen Tu (University of California, San Diego) · Xiaolong Wang (University of California, San Diego) | Paper

Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction Anticipation

Razvan Pasca (None) · Alexey Gavryushin (ETH Zurich) · Muhammad Hamza (University of Zurich) · Yen-Ling Kuo (University of Virginia) · Kaichun Mo (NVIDIA Research) · Luc Van Gool (ETH Zurich, KU Leuven, Institute for Computer Science, AI, and Technology) · Otmar Hilliges (None) · Xi Wang (None) | Paper

Snapshot Lidar: Fourier embedding of phasors for single-image depth reconstruction

Sarah Friday (Dartmouth College) · Yunzi Shi (Dartmouth College) · Yaswanth Kumar Cherivirala (University of Michigan, NVIDIA) · Vishwanath Saragadam (University of California, Riverside) · Adithya Pediredla (Dartmouth College) | Paper Coming Soon

Analyzing and Improving the Training Dynamics of Diffusion Models

Tero Karras (NVIDIA) · Miika Aittala (NVIDIA) · Jaakko Lehtinen (Aalto University, NVIDIA) · Janne Hellsten (NVIDIA) · Timo Aila (NVIDIA) · Samuli Laine (NVIDIA) | Paper

BodyMAP—Jointly Predicting Body Mesh and 3D Applied Pressure Map for People in Bed

Abhishek Tandon (Carnegie Mellon University) · Anujraaj Goyal (Carnegie Mellon University) · Henry M. Clever (NVIDIA) · Zackory Erickson (Carnegie Mellon University) | Paper

PartDistill: 3D Shape Part Segmentation by Vision-Language Model Distillation

Ardian Umam (National Yang Ming Chiao Tung University) · Cheng-Kun Yang (MediaTek) · Min-Hung Chen (NVIDIA) · Jen-Hui Chuang (None) · Yen-Yu Lin (National Yang Ming Chiao Tung University) | Paper

What Moves Together Belongs Together

Jenny Seidenschwarz (Technische Universität München) · Aljoša Ošep (Carnegie Mellon University) · Francesco Ferroni (NVIDIA) · Simon Lucey (University of Adelaide) · Laura Leal-Taixe (NVIDIA) | Paper

UnScene3D: Unsupervised 3D Instance Segmentation for Indoor Scenes

David Rozenberszki (None) · Or Litany (NVIDIA, Technion) · Angela Dai (Technical University of Munich) | Paper

Producing and Leveraging Online Map Uncertainty in Trajectory Prediction

Xunjiang Gu (University of Toronto) · Guanyu Song (University of Toronto) · Igor Gilitschenski (University of Toronto) · Marco Pavone (NVIDIA) · Boris Ivanovic (NVIDIA) | Paper

SatSynth: Augmenting Image-Mask Pairs Through Diffusion Models for Aerial Semantic Segmentation

Aysim Toker (Technical University Munich) · Marvin Eisenberger (Technical University Munich) · Daniel Cremers (Technical University Munich) · Laura Leal-Taixe (NVIDIA) | Paper

Category-Level Multi-Part Multi-Joint 3D Shape Assembly

Yichen Li (MIT) · Kaichun Mo (NVIDIA Research) · Yueqi Duan (None) · He Wang (None) · Jiequan Zhang (None) · Lin Shao (National University of Singapore) · Wojciech Matusik (MIT) · Leonidas Guibas (Stanford University) | Paper

Dynamic LiDAR Resimulation Using Compositional Neural Fields

Hanfeng Wu (None) · Xingxing Zuo (Caltech) · Stefan Leutenegger (Technische Universität München) · Or Litany (NVIDIA, Technion) · Konrad Schindler (ETH Zurich) · Shengyu Huang (None) | Paper

NIFTY: Neural Object Interaction Fields for Guided Human Motion Synthesis

Nilesh Kulkarni (None) · Davis Rempe (NVIDIA) · Kyle Genova (Google) · Abhijit Kundu (Google) · Justin Johnson (University of Michigan) · David Fouhey (New York University) · Leonidas Guibas (Stanford University) | Paper

Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer

Rafail Fridman (Weizmann Institute of Science) · Danah Yatim (Weizmann Institute of Science) · Omer Bar-Tal (Weizmann Institute of Science) · Yoni Kasten (NVIDIA Research) · Tali Dekel (Weizmann Institute of Science) | Paper

Make Me a BNN: A Simple Strategy for Estimating Bayesian Uncertainty From Pretrained Models

Gianni Franchi (ENSTA Paris) · Olivier Laurent (Université Paris-Saclay) · Maxence Leguéry (ENSTA Paris) · Andrei Bursuc (valeo.ai) · Andrea Pilzer (NVIDIA) · Angela Yao (National University of Singapore) | Paper

Seg2Reg: Differentiable 2D Segmentation to 1D Regression Rendering for 360 Room Layout Reconstruction

Cheng Sun (NVIDIA) · Wei-En Tai (National Tsinghua University) · Yu-Lin Shih (National Tsinghua University) · Kuan-Wei Chen (National Tsinghua University) · Yong-Jing Syu (National Tsinghua University) · Kent Selwyn The (National Tsinghua University) · Yu-Chiang Frank Wang (NVIDIA) · Hwann-Tzong Chen (National Tsing Hua University) | Paper

Outdoor Scene Extrapolation With Hierarchical Generative Cellular Automata

Dongsu Zhang (Seoul National University) · Francis Williams (NVIDIA) · Žan Gojčič (NVIDIA) · Karsten Kreis (NVIDIA) · Sanja Fidler (University of Toronto) · Young Min Kim (Seoul National University) · Amlan Kar (NVIDIA) | Paper Coming Soon

Condition-Aware Neural Network for Controlled Image Generation

Han Cai (MIT) · Muyang Li (None) · Qinsheng Zhang (Georgia Institute of Technology) · Ming-Yu Liu (NVIDIA) · Song Han (MIT) | Paper

PrPSeg: Universal Proposition Learning for Panoramic Renal Pathology Segmentation

Ruining Deng (Vanderbilt University) · Quan Liu (Vanderbilt University) · Can Cui (Vanderbilt University) · Tianyuan Yao (Vanderbilt University) · Jialin Yue (Vanderbilt University) · Juming Xiong (Vanderbilt University) · Lining yu (Vanderbilt University) · Yifei Wu (Vanderbilt University) · Mengmeng Yin (Vanderbilt University) · Yu Wang (Vanderbilt University Medical Center) · Shilin Zhao (Vanderbilt University) · Yucheng Tang (NVIDIA) · Haichun Yang (Vanderbilt University Medical School) · Yuankai Huo (Vanderbilt University) | Paper

NeRFDeformer: NeRF Transformation From a Single View via 3D Scene Flows

Zhenggang Tang (University of Illinois Urbana-Champaign) · Jason Ren (Apple) · Xiaoming Zhao (Univerity of Illinois Urbana-Champaign) · Bowen Wen (NVIDIA) · Jonathan Tremblay (NVIDIA) · Stan Birchfield (NVIDIA) · Alexander G. Schwing (University of Illinois Urbana-Champaign) | Paper Coming Soon

PerAda: Parameter-Efficient Federated Learning Personalization With Generalization Guarantees

Chulin Xie (University of Illinois Urbana-Champaign) · De-An Huang (NVIDIA) · Wenda Chu (Caltech) · Daguang Xu (NVIDIA) · Chaowei Xiao (Arizona State University) · Bo Li (University of Illinois Urbana-Champaign) · Anima Anandkumar (Caltech) | Paper

Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer Compression

Hancheng Ye (Fudan University) · Chong Yu (Fudan University, NVIDIA) · Peng Ye (Fudan University) · Renqiu Xia (Shanghai Jiao Tong University) · Bo Zhang (Shanghai AI Laboratory) · Yansong Tang (Tsinghua University) · Jiwen Lu (Tsinghua University) · Tao Chen (Fudan University) | Paper

Mining Supervision for Dynamic Regions in Self-Supervised Monocular Depth Estimation

Hoang Chuong Nguyen (Australian National University) · Tianyu Wang (Australian National University) · Jose M. Alvarez (NVIDIA) · Miaomiao Liu (Australian National University) | Paper

GSNeRF: Generalizable Semantic Neural Radiance Fields With Enhanced 3D Scene Understanding

Zi-Ting Chou (National Taiwan University) · Sheng-Yu Huang (National Taiwan University) · I-Jieh Liu (National Taiwan University) · Yu-Chiang Frank Wang (NVIDIA) | Paper

Breathing Life Into Sketches Using Text-to-Video Priors

Rinon Gal (Tel Aviv University, NVIDIA) · Yael Vinker (Tel Aviv University) · Yuval Alaluf (Tel Aviv University) · Amit H. Bermano (Tel Aviv University, Technion) · Daniel Cohen-Or (Google) · Ariel Shamir (Reichman University) · Gal Chechik (NVIDIA, Bar-Ilan University) | Paper

Self-correcting LLM-Controlled Diffusion

Tsung-Han Wu (University of California, Berkeley) · Long Lian (University of California, Berkeley) · Joseph Gonzalez (University of California, Berkeley) · Boyi Li (University of California, Berkeley, NVIDIA) · Trevor Darrell (University of California, Berkeley) | Paper

Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding

Wujian Peng (Fudan University) · Sicheng Xie (Fudan University) · Zuyao You (Fudan University) · Shiyi Lan (NVIDIA) · Zuxuan Wu (Fudan University) | Paper

Driving Everywhere With Large Language Model Policy Adaptation

Boyi Li (University of California, Berkeley, NVIDIA) · Yue Wang (MIT) · Jiageng Mao (Chinese University of Hong Kong) · Boris Ivanovic (NVIDIA) · Sushant Veer (NVIDIA) · Karen Leung (University of Washington) · Marco Pavone (NVIDIA) | Paper

RegionGPT: Towards Region Understanding Vision Language Model

Qiushan Guo ( University of Hong Kong) · Shalini De Mello (NVIDIA Research) · Danny Yin (NVIDIA) · Wonmin Byeon (NVIDIA) · Ka Chun Cheung (NVIDIA) · Yizhou Yu ( University of Hong Kong) · Ping Luo ( University of Hong Kong) · Sifei Liu (NVIDIA) | Paper

JeDi: Joint-Image Diffusion Models for Fine-Tuning-Free Personalized Text-to-Image Generation

Yu Zeng (None) · Vishal M. Patel (Johns Hopkins University) · Haochen Wang (Toyota Technological Institute at Chicago) · Xun Huang (NVIDIA) · Ting-Chun Wang (NVIDIA) · Ming-Yu Liu (NVIDIA) · Yogesh Balaji (NVIDIA) | Paper Coming Soon

CurveCloudNet: Processing Point Clouds With 1D Structure

Colton Stearns (None) · Alex Fu (Illumix) · Jiateng Liu (University of Illinois Urbana-Champaign) · Jeong Joon Park (Stanford University) · Davis Rempe (NVIDIA) · Despoina Paschalidou (Stanford University) · Leonidas Guibas (Stanford University) | Paper

MCPNet: An Interpretable Classifier via Multi-Level Concept Prototypes

Bor Shiun Wang (None) · Chien-Yi Wang (NVIDIA) · Wei-Chen Chiu (None) | Paper Coming Soon

A Unified Approach for Text- and Image-Guided 4D Scene Generation

Yufeng Zheng (ETH Zurich, Max Planck Institute for Intelligent Systems) · Xueting Li (NVIDIA) · Koki Nagano (None) · Sifei Liu (NVIDIA) · Otmar Hilliges (None) · Shalini De Mello (NVIDIA Research) | Paper

BEVNeXt: Reviving Dense BEV Frameworks for 3D Object Detection

Zhenxin Li (Fudan University) · Shiyi Lan (NVIDIA) · Jose M. Alvarez (NVIDIA) · Zuxuan Wu (Fudan University) | Paper

3DiffTection: 3D Object Detection With Geometry-Aware Diffusion Features

Chenfeng Xu (University of California, Berkeley) · Huan Ling (NVIDIA, University of Toronto) · Sanja Fidler (University of Toronto) · Or Litany (NVIDIA, Technion) | Paper

XCube: Large-Scale 3D Generative Modeling Using Sparse Voxel Hierarchies

Xuanchi Ren (University of Toronto) · Jiahui Huang (None) · Xiaohui Zeng (University of Toronto) · Ken Museth (NVIDIA) · Sanja Fidler (University of Toronto) · Francis Williams (NVIDIA) | Paper

Degree-of-Freedom Matters: Inferring Dynamics From Point Trajectories

Yan Zhang (ETH Zurich) · Sergey Prokudin (ETH Zurich) · Marko Mihajlovic (Swiss Federal Institute of Technology) · Qianli Ma (NVIDIA Research) · Siyu Tang (ETH Zurich) | Paper Coming Soon

AHIVE: Anatomy-Aware Hierarchical Vision Encoding for Interactive Radiology Report Retrieval

Sixing Yan (None) · William K. Cheung (Hong Kong Baptist University) · Ivor Tsang (A*STAR) · Wan Hang Keith Chiu (Queen Elizabeth Hospital) · Tong Terence (The Chinese University of Hong Kong) · Ka Chun Cheung (NVIDIA) · Simon See (NVIDIA) | Paper Coming Soon

AM-RADIO: Agglomerative Models—Reduce All Domains Into One

Mike Ranzinger (NVIDIA Research) · Greg Heinrich (NVIDIA) · Jan Kautz (NVIDIA) · Pavlo Molchanov (NVIDIA) | Paper

COLMAP-Free 3D Gaussian Splatting

Yang Fu (University of California, San Diego) · Sifei Liu (NVIDIA) · Amey Kulkarni (NVIDIA) · Jan Kautz (NVIDIA) · Alexei A. Efros (University of California, Berkeley) · Xiaolong Wang (University of California, San Diego) | Paper

Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation

Yunhao Ge (University of Southern California) · Xiaohui Zeng (University of Toronto) · Jacob Huffman (NVIDIA) · Tsung-Yi Lin (NVIDIA) · Ming-Yu Liu (NVIDIA) · Yin Cui (NVIDIA) | Paper

Addressing Background Context Bias in Few-Shot Segmentation Through Iterative Modulation

Lanyun Zhu (Singapore University of Technology and Design) · Tianrun Chen (Zhejiang University) · Jianxiong Yin (NVIDIA) · Simon See (NVIDIA) · Jun Liu () | Paper Coming Soon

PARA-Drive: Parallelized Architecture for Real-Time Autonomous Driving

Xinshuo Weng (NVIDIA) · Boris Ivanovic (NVIDIA) · Yan Wang (NVIDIA) · Yue Wang (MIT) · Marco Pavone (NVIDIA) | Paper Coming Soon

Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?

Zhiqi Li (Nanjing University) · Zhiding Yu (NVIDIA) · Shiyi Lan (NVIDIA) · Jiahan Li (Nanjing University) · Jan Kautz (NVIDIA) · Tong Lu (Nanjing University) · Jose M. Alvarez (NVIDIA) | Paper

Improving Distant 3D Object Detection Using 2D Box Supervision

Zetong Yang (The Chinese University of Hong Kong) · Zhiding Yu (NVIDIA) · Christopher Choy (Stanford University) · Renhao Wang (University of California, Berkeley) · Anima Anandkumar (Caltech) · Jose M. Alvarez (NVIDIA) | Paper

PACER+: On-Demand Pedestrian Animation Controller in Driving Scenarios

Jingbo Wang (Shanghai AI Laboratory) · Zhengyi Luo (Carnegie Mellon University) · Ye Yuan (NVIDIA Research) · Yixuan LI (The Chinese University of Hong Kong) · Bo Dai (Shanghai AI Laboratory) | Paper Coming Soon

RGBD Objects in the Wild: Scaling Real-World 3D Object Learning From RGB-D Videos

Hongchi Xia (Shanghai Jiaotong University) · Yang Fu (University of California, San Diego) · Sifei Liu (NVIDIA) · Xiaolong Wang (University of California, San Diego) | Paper

MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer

Jianjian Cao (Fudan University) · Peng Ye (Fudan University) · Shengze Li (Fudan University) · Chong Yu (Fudan University, NVIDIA) · Yansong Tang (Tsinghua University) · Jiwen Lu (Tsinghua University) · Tao Chen (Fudan University) | Paper

Load More
Load Less

Deep Dive

Training Computer Vision Models With Synthetic Data

Staying in Sync: NVIDIA Combines Digital Twins With Real-Time AI for Industrial Automation

NVIDIA Omniverse™, Metropolis, Isaac™, and cuOpt™ interact in AI gyms where developers can train AI agents to help robots and humans navigate unpredictable or complex events.

Toronto AI Lab

NVIDIA’s Toronto AI Research Group consists of a group of researchers passionate about the intersection of computer vision, machine learning, and computer graphics.

NVIDIA VILA

VILA is a visual language model that brings visual information into large language models (LLMs). To leverage powerful LLMs, VILA uses a visual encoder to encode images or video as visual tokens and then inputs these tokens into the LLM as if they’re a foreign language.

Featured Demos

Explore how NVIDIA technologies are transforming a variety of industries with powerful demos that highlight the latest breakthroughs in AI, data science, graphics, healthcare, and more.

Transform Edge AI Applications With Generative AI

A generative AI application is one of the reference applications offered with NVIDIA Metropolis microservices for NVIDIA Jetson™, a suite of cloud-native building blocks for developing edge AI applications and solutions.

Fusing Real Time AI With Digital Twins

Discover the AI that'll drive the next phase of industrial automation—how it'll be developed, refined, and first deployed in simulation via digital twins.

NVIDIA Visual Insight Agent (VIA)

This new generation of visual AI agents will help nearly every industry summarize, search, and extract actionable insights from video using natural language.

Techman Robots Using Isaac and AI Vision

Techman developed robotic automatic optical inspection (AOI) solutions by using NVIDIA Isaac Sim™ to simulate, test, and optimize their state-of-the-art collaborative robots (cobots).

NVIDIA Research AI Playground

Discover our most recent AI research and the new capabilities deep learning brings to visual and audio applications. Explore the latest innovations and see how you can bring them into your own work.

Resources

Free Computer Vision Course

Join our free NVIDIA Developer Program to access training, resources, and tools that can accelerate your work and advance your skills. Get a free credit for our self-paced course, Synthetic Data Generation for Training Computer Vision Models, when you join.

NVIDIA Deep Learning Institute

Develop practical skills and validate your expertise with hands-on. self-paced courses, instructor-led workshops, and technical certifications.

NVIDIA Inception for Startups

Explore the program that provides cutting-edge startups around the world with critical access to go-to-market support, technical expertise, training, and funding opportunities.

Sign up to receive the latest news from NVIDIA.