Playlist | Triton GTC Sessions

Triton GTC Sessions

37 sessions

Accelerated Intelligent App Development and Deployment with OctoML and Triton

March 2022

, CTO and Co-founder, OctoML

Developers wanting to build full-stack intelligent applications powered by AI are creatively constrained by deficiencies in today’s MLOps workflows. The complexity is two-fold: (1) There isn’t a fast path for getting the latest deep learning innovations, such as transformer models, quickly integrated into …

45:11

Deploying Triton with Kubernetes at Scale

March 2022

, ML Lead, Falkonry

, CTO, Falkonry

Falkonry’s AI exploits the computational power of GPUs to provide real-time insights against trillions of high-speed data points. AI for defense and industrial operations are constrained by 2 factors: Speed & Scale. Using an orderly three-way brokerage between data, models, and GPU resources, supported …

34:56

MLOps Pipeline With PyTorch And Jupyter Notebooks on GCP Vertex Workbench

March 2022

, Senior Engineering Manager, Meta

, Meta

We'll cover end-to-end story starting with prototype building with PyTorch on GCP Vertex Workbench and going all the way to the production deployment on the Vertex AI Prediction with Nvidia Triton. The most important part of the talk: we won't have to rewrite code after having a Jupyter Notebook with …

35:14

Optimal AzureML Triton Model Deployment using the Model Analyzer

March 2022

, Developer Relations Manager, NVIDIA

, Program Manager, Azure Machine Learning, Microsoft

, Solutions Architect, NVIDIA

We'll discuss the Triton Model Analyzer tool and its role in generating the optimal model configuration that can be hosted on the AzureML. We'll cover optimizations of various hyperparameters, along with framework-specific optimizations that can be sweeped using Model Analyzer. The reports generated …

39:06

Automating Vehicle Damage Estimation with Computer Vision

March 2022

, Global Partner/Customer Partner Manager Financial Services, NVIDIA

, Senior Engagement Manager, Quantiphi

Auto insurers in the United States alone are losing almost $30 billion annually due to errors in damage estimation. The poor quality of vehicle images is a major contributing factor. Identifying grainy photo uploads in claims submission is a practical challenge. Failure to accurately identify noisy images may …

16:23

How Transformers and ASR Change the Way of Search

March 2022

, Data Scientist, SVA System Vertrieb Alexander GmbH

Learn about how we can use the possibilities of natural language processing for enterprise search systems using the unmatched abilities of transformers models and NVIDIA infrastructure for semantic search and question answering. We will make a deep dive into how Transformers using the power of …

24:36

云端 Triton 生产实践 Triton in the Cloud: A Practical Way

March 2022

, Staff Engineer, Alibaba Cloud

, Senior Engineer, Alibaba Cloud

, Staff Engineer, Alibaba Cloud

Triton Inference Server is a full-featured, extensible, and powerful inferencing solution on both the edge and cloud sides. When deploying Triton to production in the cloud, efficiency, scalability, and integration with infrastructure other than the server itself should be taken into consideration. We'll …

30:37

Faster Transformer + Triton：用于大型 NLP 模型推理的多节点实验方法 Faster Transformer Plus Triton: Experimental Approach on Multi-node …

March 2022

, Senior AI Developer Technology Engineer, NVIDIA

With GPT-3, it's impossible to run the inference of the entire model on a single GPU. We must extend to multiple GPUs, or even multiple node serving. We'll demonstrate how to integrate the FasterTransformer, which is highly optimized and flexible transformer library, with Triton inference …

38:33

Building Streaming End-to-end Speech Recognition Service with Triton Inference Server

March 2022

, Manager of Speech Team, LINE Corp.

, Senior Solution Architect, NVIDIA

LINE has developed a fast and accurate speech recognition service using a state-of-the-art streaming end-to-end (E2E) model. Our service is required to return recognition results with low latency for each speech request posted by many clients. However, it's hard to satisfy that requirement using the E2E …

52:33

Connect with the Experts: Optimize Deep Learning Inference Workloads using NVIDIA TensorRT and Deploying AI Models in …

March 2022

, Product Manager, NVIDIA

, Technical Marketing Engineer, NVIDIA

, Deep Learning Solutions Architect, NVIDIA

, Product Manager, NVIDIA

, Group Product Manager, NVIDIA

, PRINCIPAL ENGINEER, NVIDIA

, TensorRT Engineering Manager, NVIDIA

In this session we focus on optimized deep learning inference and deployment of AI models in production. We'll discuss how the NVIDIA TensorRT, an SDK for high-performance deep learning inference, can deliver low latency and high throughput for deep learning applications. We'll then talk …

56:39

Merlin HugeCTR: GPU-accelerated Recommender System Training and Inference

March 2022

, AI Developer Technology Engineer, NVIDIA

Merlin HugeCTR is a recommender system-specific framework which accelerates the training and deployment of complex deep learning models on GPUs at scale. Since its public release in early 2020, we've added a lot of enhancements to performance and usability. We'll introduce some of them, …

36:24

Serving ML Models at LinkedIn

March 2022

, Senior Data Scientist, NVIDIA

, Software Engineer, Machine Learning Infrastructure, LinkedIn

, Staff Software Engineer, LinkedIn

Learn how LinkedIn's model serving has evolved to support Deep Learning with TensorFlow, how we accelerated inference with GPUs and what we experimented with ONNX and Triton.

35:28

Simplify and Scale Model Serving with NVIDIA Triton Inference Server on Google Cloud Vertex AI Prediction (Presented by …

March 2022

, Solutions Architecture , NVIDIA

, Solutions Architect, Machine Learning, Google Cloud

, Software Engineer, Vertex AI Prediction , Google Cloud

NVIDIA Triton Inference Server (Triton) is an open-source inference serving software that maximizes performance and simplifies model deployment at scale. Triton supports multiple frameworks (TensorRT, TensorFlow, ONNX, PyTorch, and more) with custom CUDA and Python backends on GPU-/CPU-based …

35:13

Fast, Scalable, and Standardized AI Inference Deployment for Multiple Frameworks, Diverse Models on CPUs and GPUs with …

March 2022

, Product Manager, NVIDIA

, Product Marketing Manager, NVIDIA

We'll go over NVIDIA Triton Inference Server software and what's new. Triton is an open-source inference-serving software for fast and scalable AI in applications. Learn how Triton helps deploy models from all popular frameworks — including TensorFlow, PyTorch, ONNX, TensorRT, RAPIDS FIL (for …

44:15

Connect with the Experts: Fast Data Preprocessing (Image, Video, Audio, and Signal) with DALI, NPP, and nvJPEG

March 2022

, Senior Software Development Engineer, NVIDIA

, Software development engineer, NVIDIA

, Senior Software Development Engineer, NVIDIA

, Senior CUDA Image Processing Engineer, NVIDIA

, Senior Deep Learning Software Engineer, NVIDIA

, Software development manager in Math Libraries team, NVIDIA

, Software engineer in the CUDA Image Processing Libraries team, NVIDIA

, Senior Software Development Engineer, NVIDIA

, Deep Learning Manger, NVIDIA

, Senior Software Development Engineer, NVIDIA

, Senior CUDA math library engineer and team lead, NVIDIA

Learn about the latest optimizations in NVIDIA's image/signal processing libraries like NPP, nvJPEG, and DALI — a fast, flexible data loading and augmentation library. We'll discuss how to use various data processing solutions spanning low-level image and signal processing primitives in NPP, image …

39:51

Search Engine for Retail Online Shopping using NVIDIA Triton for High Volume of User Connections

March 2022

, Assistant Chief Information Technology Officer, CP All Public Company Limited (Thailand)

, Technology Manager, Gosoft (Thailand) Co., Ltd.

, Robot Engineer, Gosoft (Thailand) Co., Ltd.

, Principal Data Scientist, Gosoft (Thailand) Co., Ltd.

, Deputy General Manager, CP All Public Company Limited (Thailand)

, Head of Asia Pacific South (APS) Regional DevRel, NVIDIA

Learn how the search engine works in online retail shopping. This overview introduces the AI tech that improves the retail business in the Thai local, and how we used Thai ASR and NVIDIA Conversational AI Platform to implement the voice search function. Users just do voice input without typing, and the …