Developer Webinar Series
AI inference––the process of running trained models to generate predictions—has become a major focus in AI development, especially with the rise of large language models (LLMs) like GPT, Llama, and Gemini. Unlike training, which happens in high-performance data centers, AI inference needs to be efficient, scalable, and cost-effective for real-world applications. For the successful use of AI inference, organizations need a full-stack approach that supports the end-to-end AI life cycle and tools that empower teams to meet their goals.
In this webinar, we’ll take a detailed look into LLM inference and optimization, including theory, model architecture, and mathematical foundations. You’ll learn about prompt processing, token generation, and inference optimization with TensorRT-LLM. We'll also discuss measuring inference performance, including latency and throughput. The NVIDIA AI Enterprise software platform consists of NVIDIA NIM™ microservices, NVIDIA Dynamo, NVIDIA® TensorRT™ ecosystem, and other tools to simplify building, sharing, and deploying AI applications. With enterprise-grade support, stability, manageability, and security, enterprises can accelerate time to value while eliminating unplanned downtime. Register for this webinar and explore the benefits of NVIDIA AI for accelerated inference. Time will be available at the end of the webinar for Q&A.
Developers with Python proficiency and a solid understanding of LLM theory who are working in generative AI and seeking to enhance their knowledge of LLM inference and optimization.
In this webinar, you'll gain a deep understanding of LLM inference theory and applications, learning about:
Don't miss this opportunity to enhance your expertise or master a new technology. Plus, receive a free NVIDIA self-paced training course (valued at up to USD 90) when you attend this webinar.
Watch how Tech Mahindra is utilizing NVIDIA's hardware and software stack to build the Nemotron-4-Mini-Hindi-4B model. Tech Mahindra’s work on Indus 2.0 with Indonesia on Bahasa is state of the art and built on NVIDIA AI Inference software.