NVIDIA NIM Microservices

Designed for rapid, reliable deployment of accelerated generative AI inference anywhere.

Get Started

Video | Solution Brief | For Developers

Overview
Benefits
Benchmarks
Models
Features
Use Cases
Starting Options
Resources
Next Steps

Overview

Overview
Benefits
Benchmarks
Models
Features
Use Cases
Starting Options
Resources
Next Steps

Get Started

Overview

What Is NVIDIA NIM?

NVIDIA NIM™ provides prebuilt, optimized inference microservices that let you deploy the latest AI foundation models with security and stability on any NVIDIA-accelerated infrastructure— cloud, data center, and workstation.

Generative AI Inference Powered by NVIDIA NIM: Performance and TCO

See how NIM microservices outperform popular alternatives, delivering up to 3x more tokens-per-second throughput when running on the same NVIDIA-accelerated infrastructure.

Watch Now

Free Developer Access to NIM

Join the NVIDIA Developer Program to download NIM microservices for free and get started developing, testing, and researching.

Join Now

Generative AI Deployment, Accelerated With NVIDIA NIM

NVIDIA NIM combines the ease of use and operational simplicity of managed APIs with the flexibility and security of self-hosting models on your preferred infrastructure. NIM microservices come with everything AI teams need—the latest AI foundation models, optimized inference engines, industry-standard APIs, and runtime dependencies—prepackaged in enterprise-grade software containers ready to deploy and scale anywhere.

Benefits

Generative AI for Enterprises That Does More for Less

Get the best of both worlds with easy, enterprise-grade microservices built for high-performance AI—designed to work seamlessly and scale affordably. Experience the fastest time to value for enterprise AI agents and other generative AI domains such as reasoning, simulation, speech, and more.

Ease of Use

Accelerate time to market with optimized, prebuilt, cloud-native microservices and empower enterprise developers with industry-standard APIs and tools tailored for enterprise needs.

Watch: Deploy NIM in 5 Minutes

Security and Manageability

Securely control generative AI applications and data with self-hosted deployment on your choice of infrastructure. Take advantage of enterprise-grade support, including dedicated feature branches, rigorous validation processes, and direct access to NVIDIA AI experts.

Performance and Scale

Improve TCO with low-latency, high-throughput AI inference that scales in the cloud, and achieve the best accuracy with support for fine-tuned models out of the box.

Watch: NIM Performance and TCO Advantage

Portability

Deploy anywhere with prebuilt microservices ready to run on any NVIDIA-accelerated infrastructure—cloud, data center, and workstation—and scale seamlessly on Kubernetes and cloud service provider environments.

Benchmarks

Boost Throughput With NIM

NVIDIA NIM provides optimized throughput and latency out of the box to maximize token generation, support concurrent users at peak times, and improve responsiveness.

Configuration: Llama 3.1 8B instruct, 1x H100 SXM; input 20000 tokens, output 2000 tokens, concurrent requests: 200. NIM ON: FP8, throughput 1201 tokens/s, ITL 32ms. NIM OFF: FP8, throughput 613 tokens/sec, ITL 37ms.

Models

Build With the Leading Open Models

Get optimized inference performance for the latest AI foundation models. NIM comes with accelerated inference engines from NVIDIA and the community, including NVIDIA® TensorRT™, TensorRT-LLM, and more—prebuilt and optimized for low-latency, high-throughput inferencing on NVIDIA-accelerated infrastructure.

View More Models

Features

Enterprise Generative AI Is Just an API Call Away

Designed to run anywhere, NIM inference microservices expose industry-standard APIs for easy integration with enterprise systems and applications and scale seamlessly on Kubernetes to deliver high-throughput, low-latency inference at cloud scale.

Deploy NIM

Deploy NIM for your model with a single command. You can also easily run NIM with fine-tuned models.

Run Inference

Get NIM up and running with the optimal runtime engine based on your NVIDIA-accelerated infrastructure.

Build

Integrate self-hosted NIM endpoints with just a few lines of code.

Deploy

Run

Build

docker run nvcr.io/nim/publisher_name/model_name

 
 curl -X 'POST' \ 
  'http://0.0.0.0:8000/v1/completions' \ 
  -H 'accept: application/json' \ 
  -H 'Content-Type: application/json' \ 
  -d '{ 
  "model" : "model_name", 
  "prompt" : "Once upon a time", 
  "max_tokens" : 64 
 }' 

 
 import openai 
 client = openai.OpenAI( 
  base_url = "YOUR_LOCAL_ENDPOINT_URL", 
  api_key="YOUR_LOCAL_API_KEY" 
 ) 
 chat_completion = client.chat.completions.create( 
  model="model_name", 
  messages=[{"role" : "user" , "content" : "Write me a love song" }], 
  temperature=0.7 
 ) 

Use Cases

How NIM Is Being Used

See how NVIDIA NIM supports industry use cases, and jump-start your AI development with curated examples.

AI Virtual Assistants
Document Intelligence
Hyperpersonalized Shopping
3D Product Configurators

AI Virtual Assistants

Enhance customer experiences and improve business processes with generative AI.

Learn About AI for Customer Support

Build Now

Virtual human in a virtual chat session.

Intelligent Document Processing

Use generative AI to accelerate and automate document processing.

Learn About Intelligent Document Processing

Build Now

AI for Hyperpersonalized Shopping

Deliver tailored experiences that enhance customer satisfaction with the power of AI.

Learn About Hyperpersonalized Shopping

Build Now

3D Product Configurators

Use OpenUSD and generative AI to develop and deploy 3D product configurator tools and experiences to nearly any device.

Learn About 3D Product Configurators

3D car created using OpenUSD and generative AI

Starting Options

Ways to Get Started With NVIDIA NIM

Start Prototyping for Free

Get started with easy-to-use, NVIDIA-managed serverless APIs.

Access fully accelerated AI infrastructure.
Ensure your data isn't used for model training.
Get started for free with 1,000 inference credits.

Build Now

Download and Deploy

Run NVIDIA NIM to scale optimized AI models in the cloud or data center of your choice.

Ensure data never leaves your secure enclave.
Seamlessly transition from cloud endpoints to self-hosted APIs without code changes.
Use an NVIDIA AI Enterprise license for production, or get started for free with the NVIDIA Developer Program.

Deploy Now

Get in Touch

Talk to an NVIDIA AI specialist about moving generative AI pilots to production with the security, API stability, and support that comes with NVIDIA AI Enterprise.

Explore your generative AI use cases.
Discuss your technical requirements.
Align NVIDIA AI solutions to your goals and requirements.

Contact Sales

Build Now

Resources

The Latest NVIDIA NIM Resources

Blogs
Sessions
Courses
Videos

NVIDIA NIM in the News

See All Tech Blogs See All Topic News

View All Sessions

Introduction to NVIDIA NIM Microservices

Learn how NIM enables the building, deploying, and scaling of AI applications.

View Course

Sizing LLM Inference Systems

Learn how to optimize and deploy large language models using NIM microservices for real-world applications.

View Course

Developing an AI Background Generator With NIM

Review the process of creating an AI-enabled NVIDIA Omniverse™ Kit-based application. You’ll learn how to use Omniverse extensions, NIM microservices, and Python code to add an extension capable of generating backgrounds from text input.

View Course

View All Courses

How to Build a Simple AI Agent in 5 Minutes With NVIDIA NIM

See how to set up two AI agents—one for content generation and another for digital graphic design.

Watch Now (04:09)