Name: Data Processing at Scale for Training LLMs | AI Summit DC 2024 | NVIDIA On-Demand
Uploaded: 2024-10-08T10:00:00Z
Description: Advancements in Large Language Models (LLMs) have enabled developers to create a variety of applications such as code generation, translation, and text sum

This site requires Javascript in order to view all its content. Please enable Javascript in order to access all the functionality of this web site. Here are the instructions how to enable JavaScript in your web browser.

詳細內容

字幕

Advancements in Large Language Models (LLMs) have enabled developers to create a variety of applications such as code generation, translation, and text summarization. The effectiveness of all these models depends on the quality of the data used for training LLMs. Data from public sources often contains duplicates, personally identifiable information(PII), or low quality information. This data needs to be processed and filtered for downstream tasks, including training and/or fine-tuning LLM models. Furthermore, synthetic data can be created for training LLMs or evaluating Retrieval-Augmented Generation (RAG) applications. In this talk, we'll show how to use NeMo Curator and functionalities from the NVIDIA RAPIDS library to accelerate data processing tasks. We'll walk through a few examples using Python Jupyter notebooks to showcase some examples and discuss best practices relevant to large-scale data curation pipelines.

活動:

日期:

技術水平需求:

領域:

產業:

語言: English

地區:

Fill out this form to enjoy this content

Section

Section

名

姓

電子郵件

組織 / 大學名稱

我願意收到下列有關 NVIDIA 的最新消息與公告：

企業業務解決方案

開發人員技術和工具

(非必選) 您可以隨時取消訂閱。

NVIDIA 隱私權政策

Follow Nvidia