Name: Optimizing and Scaling LLMs With TensorRT-LLM for Text Generation | GTC 24 2024 | NVIDIA On-Demand
Uploaded: 2024-03-20T14:00:00Z
Duration: 6729 s
Description: The landscape of large language models (LLMs) is evolving quickly

Watch NVIDIA CEO Jensen Huang's GTC keynote replay to catch all the announcements and more.

Watch Now

NVIDIA On-Demand

This site requires Javascript in order to view all its content. Please enable Javascript in order to access all the functionality of this web site. Here are the instructions how to enable JavaScript in your web browser.

詳細

字幕

The landscape of large language models (LLMs) is evolving quickly. With model parameters and size increasing, optimizing and deploying LLMs for inference gets very complex. This requires a framework with better API support for easy extension, where there is little emphasis on memory management or CUDA calls. Learn how we used NVIDIA’s suite of solutions for optimizing LLM models and deploying in multi-GPU environments.

イベント:

日付:

トピック:

業界:

レベル:

NVIDIA technology: TensorRT

言語: English

地域:

Fill out this form to enjoy this content

Section

Section

名

姓

メールアドレス

組織名/大学名

NVIDIA から最新ニュース、お知らせ等を受け取る:

企業向けビジネスソリューション

開発者向けテクノロジ & ツール

(任意) 配信停止はいつでも可能です。

NVIDIA プライバシーポリシー

Follow Nvidia