What is Physical AI?

Physical AI lets autonomous systems like robots, self-driving cars, and smart spaces perceive, understand, and perform complex actions in the real (physical) world. It’s also often referred to as “generative physical AI” because of its ability to generate insights and actions.

How Does Physical AI Work?

Generative AI models—large language models such as GPT and Llama—are trained on enormous amounts of text and image data, largely gathered from the Internet. These AIs have astonishing capabilities in producing human language and abstract concepts, but they're limited in their grasp of the physical world and its rules.

Generative physical AI extends current generative AI with an understanding of spatial relationships and the physical behavior of the 3D world we all live in. This is done by providing additional data that contains information about the spatial relationships and physical rules of the real world during the AI training process.

The 3D training data is generated from highly accurate computer simulations, which serve as both a data source and an AI training ground.

Physically based data generation starts with a digital twin of a space, such as a factory. In this virtual space, sensors and autonomous machines like robots are added. Simulations that mimic real-world scenarios are performed, and the sensors capture various interactions like rigid body dynamics—such as movement and collisions—or how light interacts in an environment.

What Is the Role of Reinforcement Learning in Physical AI?

Reinforcement learning teaches autonomous machines skills in a simulated environment to perform in the real world. It allows autonomous machines to learn skills safely and quickly through thousands or even millions of acts of trial and error.

This learning technique rewards a physical AI model for successfully completing desired actions in the simulation, so the model continuously adapts and improves. With repeated reinforcement learning, autonomous machines eventually adapt to new situations and unforeseen challenges appropriately, preparing them to operate in the real world. Over time, an autonomous machine can develop sophisticated fine motor skills needed for real-world applications, such as neatly packing boxes, helping to build vehicles, or navigating environments unassisted.

Why Is Physical AI Important?

Previously, autonomous machines were unable to perceive and sense the world around them. But with generative physical AI, robots can be built and trained to seamlessly interact with, and adapt to, their surroundings in the real world.

To build physical AI, teams need powerful, physics-based simulations that provide a safe, controlled environment for training autonomous machines. This not only enhances the efficiency and accuracy of robots in performing complex tasks, but also facilitates more natural interactions between humans and machines, improving accessibility and functionality in real-world applications.

Generative physical AI is unlocking new capabilities that will transform every industry. For example:

Robots: With physical AI, robots demonstrate significant advancements in operational capabilities within various settings.

Autonomous Mobile Robots (AMRs) in warehouses can navigate complex environments and avoid obstacles, including humans, by using direct feedback from onboard sensors.
Manipulators can adjust their grasping strength and position based on the pose of objects on a conveyor belt, showcasing both fine and gross motor skills tailored to the object type.
Surgical robots benefit from this technology by learning intricate tasks such as threading needles and performing stitches, highlighting the precision and adaptability of generative physical AI in training robots for specialized tasks.
Humanoid robots — general-purpose robots — need both gross and fine motor skills, as well as the ability to perceive, understand, navigate, and interact with the physical world, no matter what the given task is.

Autonomous Vehicles (AVs): AVs use sensors to perceive and understand their surroundings, enabling them to make informed decisions in various environments, from open freeways to urban cityscapes. Training AVs on physical AI lets them more accurately detect pedestrians, respond to traffic or weather conditions, and autonomously navigate lane changes, effectively adapting to a wide range of unexpected scenarios.

Smart Spaces: Physical AI is enhancing the functionality and safety of large indoor and outdoor spaces like factories and warehouses, where daily activities involve a steady traffic of people, vehicles, and robots. Using fixed cameras and advanced computer vision models, teams can enhance dynamic route planning and optimize operational efficiency by tracking multiple entities and activities within these spaces. Video analytics AI agents further improve safety and efficiency by automatically detecting anomalies and providing real-time alerts.

How Can You Get Started With Physical AI?

Building the next generation of autonomous systems using physical AI involves a coordinated process across multiple, specialized computers.

Construct a virtual 3D environment: A high-fidelity, physically based virtual environment is needed to represent the real environment and generate the synthetic data necessary for training physical AI. NVIDIA Omniverse™ is a platform of APIs, SDKs, and services that lets you easily integrate Universal Scene Description (OpenUSD) and NVIDIA RTX™ rendering technologies into existing software tools and simulation workflows to build these 3D environments. This environment is supported by NVIDIA OVX™ systems. This step also includes capturing large-scale scenes or data that are needed for simulation or model training. A key technological breakthrough that has allowed for efficient AI model training and inference with rich 3D datasets is called fVDB, which efficiently represents features and is an extension to PyTorch. It allows for deep learning operations to be performed on large-scale 3D data.
Generate 3D-to-real synthetic data: Use Omniverse Replicator SDK for environment and object domain randomization. Render the randomized scenes as images or videos, then use NVIDIA Cosmos™ models for 3D-to-real photoreal video generation to further multiply the dataset.
Train and validate: NVIDIA DGX™ is a fully integrated hardware and software AI platform. It can be used with physically based data to train or fine-tune AI models with frameworks such as TensorFlow, PyTorch, or NVIDIA TAO, along with pretrained computer vision models available on NVIDIA NGC. Once trained, the model and its software stack can be validated in simulation using reference applications like NVIDIA Isaac Sim™. You can also use open-source frameworks, such as Isaac Lab, to refine the skills of the robot using reinforcement learning.
Deploy: Finally, the optimized stack and policy model can be deployed via NVIDIA Blueprints on NVIDIA Jetson™ or NVIDIA DRIVE AGX™ to run embedded in the autonomous robot, vehicle, or smart space. For example, you can deploy “Mega”, an NVIDIA Omniverse Blueprint to simulate factory operations and build video analytics AI agents with Metropolis AI Blueprint for video search and summarization to provide oversight on factory performance and safety.

Next Steps

Accelerate Physical AI Development

The development of physical-AI-embodied systems such as robots and autonomous vehicles is accelerated with the NVIDIA Cosmos platform.

Learn More About Cosmos

Advance AI Workflows

Discover how synthetic data can be used for training various physical AI models used in autonomous vehicles, industrial inspection, and robots.

Learn More About Synthetic Data

Train and Validate AI Robots

Explore Isaac Sim to design, simulate, test, and train AI-based robots in a physically based virtual environment.

Learn More About NVIDIA Isaac Sim