nvidia

Build an Enterprise RAG pipeline

Connect AI applications to multimodal enterprise data with a retrieval augmented generation (RAG) pipeline.

llama-3_1-70b-instruct•llama-3_2-nv-embedqa-1b-v2•llama-3_2-nv-rerankqa-1b-v2•nemoretriever-page-elements-v2•nemoretriever-table-structure-v1•nemoretriever-graphic-elements-v1•paddleocr

blueprint nim nemo retriever retrieval-augmented generation nvidia ai

View Source Code Deploy Launchable

The NVIDIA AI Blueprint for RAG gives developers a foundational starting point for building scalable, customizable retrieval pipelines that deliver both high accuracy and throughput. Use this blueprint to create RAG applications that provide context-aware responses by connecting LLMs to extensive multimodal enterprise data—an essential capability for most generative AI use cases.

This blueprint can be utilized as-is, combined with other NVIDIA Blueprints, such as the Digital Human Blueprint or the AI Virtual Assistant for customer service, or integrated with an agent to support more advanced use cases. Get started with this reference architecture to unlock actionable insights, ground your decisions in relevant data, and boost overall productivity.

Architecture Diagram

Key Features

Multimodal data extraction support with text, tables, charts, and infographics
Hybrid search with dense and sparse search
Opt-in image captioning with vision language models (VLMs)
Reranking to further improve accuracy
GPU-accelerated Index creation and search
Multi-turn conversations
Multi-session support
Telemetry and observability
Opt-in for reflection to improve accuracy
Opt-in for guardrailing conversations
Sample user interface
OpenAI-compatible APIs
Decomposable and customizable

Minimum System Requirements

Hardware Requirements The blueprint offers two primary modes of deployment. By default, it deploys the referenced NIM microservices locally. Each method lists its minimum required hardware. This will change if the deployment turns on optional configuration settings.

Docker
- 4xH100 or 6xA100
Kubernetes
- 9xH100 or 11XA100
The blueprint provides the alternative to use NGC-hosted models, in which case one GPU will be required to host the NVIDIA cuVS-accelerated vector database.
The blueprint can be modified to use additional NIM microservices hosted by NVIDIA.

OS Requirements

Ubuntu 22.04 OS

Deployment Options

Docker
Kubernetes

Software used in this blueprint

NVIDIA Technology

NeMo Retriever Llama 3.2 Embedding NIM
NeMo Retriever Llama 3.2 Reranking NIM
Llama 3.1 70B Instruct NIM
NeMo Retriever Page Elements NIM
NeMo Retriever Table Structure NIM
NeMo Retriever Graphic Elements NIM
PaddleOCR NIM
NeMo Retriever Parse NIM (optional)
Llama 3.1 NemoGuard 8B Content Safety NIM (optional)
Llama 3.1 NemoGuard 8B Topic Control NIM (optional)
Llama 3.2 11B Vision Instruct NIM (optional)
Mixtral 8x22B Instruct 0.1 (optional)

3rd Party Software

LangChain
Milvus database (accelerated with NVIDIA cuVS)

Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility, and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure the models meet requirements for the relevant industry and use case and address unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI concerns here.

License

Use of the models in this blueprint is governed by the NVIDIA AI Foundation Models Community License.

Terms of Use

This blueprint is governed by the NVIDIA Agreements | Enterprise Software | NVIDIA Software License Agreement and the NVIDIA Agreements | Enterprise Software | Product Specific Terms for AI Product. The models are governed by the NVIDIA Agreements | Enterprise Software | NVIDIA Community Model License and the NVIDIA RAG dataset which is governed by the NVIDIA Asset License Agreement.

The following models that are built with Llama are governed by the Llama 3.2 Community License Agreement: llama-3.1-70b-instruct, nvidia/llama-3.2-nv-embedqa-1b-v2, and nvidia/llama-3.2-nv-rerankqa-1b-v2.