Developer Webinar Series

Optimizing Inference on Large Language Models With NVIDIA

Introduction
Learnings & Agenda
Speakers
Resources

Introduction
Learnings & Agenda
Speakers
Resources

Build Production-Ready Generative AI Solutions to Transform Your Organization

AI inference––the process of running trained models to generate predictions—has become a major focus in AI development, especially with the rise of large language models (LLMs) like GPT, Llama, and Gemini. Unlike training, which happens in high-performance data centers, AI inference needs to be efficient, scalable, and cost-effective for real-world applications. For the successful use of AI inference, organizations need a full-stack approach that supports the end-to-end AI life cycle and tools that empower teams to meet their goals.

In this webinar, we’ll take a detailed look into LLM inference and optimization, including theory, model architecture, and mathematical foundations. You’ll learn about prompt processing, token generation, and inference optimization with TensorRT-LLM. We'll also discuss measuring inference performance, including latency and throughput. The NVIDIA AI Enterprise software platform consists of NVIDIA NIM™ microservices, NVIDIA Dynamo, NVIDIA® TensorRT™ ecosystem, and other tools to simplify building, sharing, and deploying AI applications. With enterprise-grade support, stability, manageability, and security, enterprises can accelerate time to value while eliminating unplanned downtime. Register for this webinar and explore the benefits of NVIDIA AI for accelerated inference. Time will be available at the end of the webinar for Q&A.

Who Should Attend:

Developers with Python proficiency and a solid understanding of LLM theory who are working in generative AI and seeking to enhance their knowledge of LLM inference and optimization.

Prerequisites 

Python programming proficiency
Familiarity with LLMs and their theoretical foundations
Basic GPU computing and memory management knowledge

Event Information

Date: Thursday, April 17, 2025
Time: 2:00 p.m. IST | 4:30 p.m. SGT | 10:30 a.m. CEST
Duration: 1 hour 30 mins

Learnings

In this webinar, you'll gain a deep understanding of LLM inference theory and applications, learning about:

Efficient prompt processing and token-generation techniques
Inference performance optimisations using TensorRT-LLM and NVIDIA Dynamo
LLM performance measurement with GenAI-Perf, hosted on TensorRT-LLM and Dynamo, and a text summarisation use case
Real-world case studies: Get answers on how to benchmark for different LLMs, understand the latency and throughput of those LLMs, and learn how to design systems around LLM considering latency and throughput.

Don't miss this opportunity to enhance your expertise or master a new technology. Plus, receive a free NVIDIA self-paced training course (valued at up to USD 90) when you attend this webinar.

Customer Success Story:

Watch how Tech Mahindra is utilizing NVIDIA's hardware and software stack to build the Nemotron-4-Mini-Hindi-4B model. Tech Mahindra’s work on Indus 2.0 with Indonesia on Bahasa is state of the art and built on NVIDIA AI Inference software.

Speakers

Sagar Desai
Senior Solution Architect
NVIDIA

Sagar specializes in large-scale generative AI solutions, including LLMs, multimodal AI, and retrieval-augmented generation (RAG). He excels at designing and deploying scalable, reliable, and secure AI architectures that deliver business value. He’s adept at advanced LLM training techniques, such as FP8 precision, and optimizing model convergence and GPU utilization.

Drawing on his background in MLOps, Sagar designs comprehensive deployment pipelines that ensure seamless, high-throughput inference with concurrent usability. Focused on accuracy, scalability, and reliability, he empowers organizations to unlock the full potential of Generative AI technologies.

Dmitry Mironov
Senior Deep Learning Solutions Architect
NVIDIA

Dmitry’s focus is on large-scale efficient deployment of LLMs and their inference. He has co-authored NVIDIA Deep Learning Institute (DLI) workshops on the topic, including the one on sizing LLM inference systems. Prior to NVIDIA, Dmitry served as a co-founder and CTO of a startup. He had been integrating computer vision into gold mining, transportation, energy, and other industries.

Resources

Discover NVIDIA AI Enterprise Software

Discover AI Inference Solution

Explore NVIDIA AI Inference Performance

Explore NVIDIA TensorRT SDK

Transforming Medical Workflows With AI: A Deep Dive into CLLMs

Explore NVIDIA TensorRT SDK

Join NVIDIA RDP

About LLM

NGC Containers: Phind-CodeLlama-34B-v2-Instruct

NGC Containers: Llama-3.1-Nemotron-70B-Instruct

NGC Containers: Llama-3-Taiwan-70B-Instruct

LLM Inference Sizing: Benchmarking End-to-End Inference Systems

Mastering LLM Techniques: Inference Optimization Technical Blog

Introducing NVIDIA Dynamo

Section
Section
Section
Section
Section
Section
Section
Section
Section
Section
Section
Section
Section

Select Events

Optimizing Inference on Large Language Models With NVIDIA

First Name

Last Name

Business Email Address

Organization / University Name

Industry

Job Title

Location

Preferred Language

ncid hidden field

nvid hidden field

enterpriseOptIns hidden field

developerOptIns hidden field

inceptionOptIns hidden field

Send me the latest news, announcements, and more from NVIDIA about Enterprise Business Solutions.

Send me the latest news, announcements, and more from NVIDIA about Developer Technology & Tools.

Yes, send me information about NVIDIA’s support, news, and more, available for startups.

Send me the latest news, announcements, and more from NVIDIA about Enterprise Business Solutions and Developer Technology & Tools.

Send me the latest news, announcements, and more from NVIDIA about:

Enterprise Business Solutions

Developer Technology & Tools

Startups support and resources

(Optional). You can unsubscribe at any time.

NVIDIA Privacy Policy

I agree to the collection and processing of the above information by NVIDIA <span class="corporation-txt hidden">Corporation </span>for the purposes of research and event organization, and I have read and agree to <a href="https://www.nvidia.com/en-us/about-nvidia/privacy-policy/?deeplink=visiting-our-website" target="_blank">NVIDIA Privacy Policy</a>.

I agree that the above information will be transferred to NVIDIA Corporation in the United States and stored in a manner consistent with <a href="https://www.nvidia.com/en-us/about-nvidia/privacy-policy/?deeplink=visiting-our-website" target="_blank">NVIDIA Privacy Policy</a> due to necessities for research, event organization and corresponding NVIDIA internal management and system operation need. You may contact us by sending an email to <a href="mailto:privacy@nvidia.com">privacy@nvidia.com</a> to resolve related problems.

Privacy - Terms