Triton GTC Sessions
37 sessions
March 2022
Developers wanting to build full-stack intelligent applications powered by AI are creatively constrained by deficiencies in today’s MLOps workflows. The complexity is two-fold: (1) There isn’t a fast path for getting the latest deep learning innovations, such as transformer models, quickly integrated into …
March 2022
Falkonry’s AI exploits the computational power of GPUs to provide real-time insights against trillions of high-speed data points. AI for defense and industrial operations are constrained by 2 factors: Speed & Scale. Using an orderly three-way brokerage between data, models, and GPU resources, supported …
March 2022
We'll cover end-to-end story starting with prototype building with PyTorch on GCP Vertex Workbench and going all the way to the production deployment on the Vertex AI Prediction with Nvidia Triton. The most important part of the talk: we won't have to rewrite code after having a Jupyter Notebook with …
March 2022
We'll discuss the Triton Model Analyzer tool and its role in generating the optimal model configuration that can be hosted on the AzureML. We'll cover optimizations of various hyperparameters, along with framework-specific optimizations that can be sweeped using Model Analyzer. The reports generated …
March 2022
Auto insurers in the United States alone are losing almost $30 billion annually due to errors in damage estimation. The poor quality of vehicle images is a major contributing factor. Identifying grainy photo uploads in claims submission is a practical challenge. Failure to accurately identify noisy images may …
March 2022
Learn about how we can use the possibilities of natural language processing for enterprise search systems using the unmatched abilities of transformers models and NVIDIA infrastructure for semantic search and question answering. We will make a deep dive into how Transformers using the power of …
March 2022
Triton Inference Server is a full-featured, extensible, and powerful inferencing solution on both the edge and cloud sides. When deploying Triton to production in the cloud, efficiency, scalability, and integration with infrastructure other than the server itself should be taken into consideration. We'll …
March 2022
With GPT-3, it's impossible to run the inference of the entire model on a single GPU. We must extend to multiple GPUs, or even multiple node serving. We'll demonstrate how to integrate the FasterTransformer, which is highly optimized and flexible transformer library, with Triton inference …
March 2022
LINE has developed a fast and accurate speech recognition service using a state-of-the-art streaming end-to-end (E2E) model. Our service is required to return recognition results with low latency for each speech request posted by many clients. However, it's hard to satisfy that requirement using the E2E …
March 2022
In this session we focus on optimized deep learning inference and deployment of AI models in production. We'll discuss how the NVIDIA TensorRT, an SDK for high-performance deep learning inference, can deliver low latency and high throughput for deep learning applications. We'll then talk …
March 2022
Merlin HugeCTR is a recommender system-specific framework which accelerates the training and deployment of complex deep learning models on GPUs at scale. Since its public release in early 2020, we've added a lot of enhancements to performance and usability. We'll introduce some of them, …
March 2022
Learn how LinkedIn's model serving has evolved to support Deep Learning with TensorFlow, how we accelerated inference with GPUs and what we experimented with ONNX and Triton.
March 2022
NVIDIA Triton Inference Server (Triton) is an open-source inference serving software that maximizes performance and simplifies model deployment at scale. Triton supports multiple frameworks (TensorRT, TensorFlow, ONNX, PyTorch, and more) with custom CUDA and Python backends on GPU-/CPU-based …
March 2022
We'll go over NVIDIA Triton Inference Server software and what's new. Triton is an open-source inference-serving software for fast and scalable AI in applications. Learn how Triton helps deploy models from all popular frameworks — including TensorFlow, PyTorch, ONNX, TensorRT, RAPIDS FIL (for …
March 2022
Learn about the latest optimizations in NVIDIA's image/signal processing libraries like NPP, nvJPEG, and DALI — a fast, flexible data loading and augmentation library. We'll discuss how to use various data processing solutions spanning low-level image and signal processing primitives in NPP, image …
March 2022
Learn how the search engine works in online retail shopping. This overview introduces the AI tech that improves the retail business in the Thai local, and how we used Thai ASR and NVIDIA Conversational AI Platform to implement the voice search function. Users just do voice input without typing, and the …
March 2022
I'll outline some practical lessons and challenges learned from training and deploying AI-powered learning systems that power personalization and recommendations for the world’s largest omni-channel retailer. How do we handle petabytes of omni-channel data to train recommender systems, …
March 2022
This session continues the technical deep dive and will cover the various technological components for scaling training and inferencing workloads using Kubernetes, Horovod, and Triton Inference Server. Learn the role of NVIDIA’s GPU and network operators, as well as the open-source components such …
March 2022
AI and machine learning operations (MLOps) engineers often struggle to deploy models on GPUs. AI models bring new requirements and challenges that aren't always aligned with traditional best practices for application deployment, while orchestrating GPUs in production brings additional new …
March 2022
Edge computing is transforming the way to our cities, every space, and every daily workflow thanks to AI-enabled intelligent applications. Application developers and system integrators leverage NVIDIA AI and GPU solutions to accelerate the forward momentum. Aetina will share Pro-AI services — how …
March 2022
Giant transformer models with billions of parameters are achieving state-of-the-art results on various natural language processing tasks. Such large models require computation linear in the number of parameters. To combat this problem, Mixture of Experts (MoE) introduces an architecture where the …
March 2022
Big data and data science applications are central to a wide range of business operations, and are at heart of countless products and services. Most of the utilities in this space, including scikit-learn, Pandas, NumPy, NetworkX, and Spark, have GPU-accelerated drop-in replacements (for example, …
March 2022
Performance is a critical component of a successful deep learning application in health care. Biopharmaceutical innovation relies on high throughput and low latency to meet strict, and in some cases real-time, performance requirements while rapidly searching for signal in vast amounts of noise…
March 2022
Our experts are highly experienced with moving AI Inference models from research to production environments and are happy to share these experiences, tools, and techniques with you, including topics such as:
- Moving from research to production
- Minimizing device memory usage
- …
March 2022
In the “Introduction to NVIDIA DALI: GPU-Accelerated Data Preprocessing“ session you learned DALI basics. In this session, you'll gain practical knowledge on how to use it effectively to accelerate a real-life deep-learning application. We'll do a use case study for a particular framework and a particular …
March 2022
As GPUs’ computational power increases, providing deep-learning models with data as fast as it's consumed gets more difficult. Data preprocessing, typically executed on the CPU, becomes the bottleneck of the system. A solution to this problem is to offload data preprocessing to the GPU. …
March 2022
Jetson software is built to not only accelerate AI applications end-to-end, but also to accelerate time to market. Learn how we bring NVIDIA technologies to the edge on Jetson for building accelerated AI applications. We'll cover the entire Jetson software stack and NVIDIA TAO (Train-Adapt-Optimize), Riva,…
November 2021
Setting up an enterprise-grade AI software infrastructure is a complicated task for IT admins. To address this, VMware and NVIDIA have partnered together to deliver the AI-Ready Enterprise platform. Learn how enterprises can benefit from NVIDIA AI Enterprise certified on vSphere. We'll guide you through …
November 2021
Traditional single instruction, multiple threads (SIMT) programming with CUDA, for all its benefits, can be daunting to machine learning researchers in need of fast custom kernels. We'll shed light on alternative programming models capable of improving GPU programmability without too much of an …
November 2021
Learn what’s new in NVIDIA Triton Inference Server. NVIDIA Triton is an open-source inference serving software that simplifies the deployment of AI models at scale in production. Deploy deep learning and machine learning models from any framework (TensorFlow, NVIDIA TensorRT, PyTorch, OpenVINO, …
November 2021
NVIDIA Triton Inference Server simplifies the deployment of AI models at scale in production. With the extended Arm NN custom backend, we can orchestrate multiple ML model execution and enable optimized CPU/GPU/NPU inference configurations on embedded systems, including NVIDIA's Jetson …
November 2021
Learn how Hugging Face achieved 1 millisecond Transformers inference for customers of its new Infinity solution. Transformers models conquered natural language processing with breakthrough accuracy. But these large models and complex architectures are too challenging for most companies to put in …
September 2022
Join NVIDIA's Triton product team to discuss highly performant and efficient inference using new functionalities in NVIDIA Triton Inference Server. Triton is an open-source inference serving software that helps standardize and streamline AI inference by enabling teams to deploy, run, and scale trained AI …
September 2022
Learn how to easily optimize and deploy every model with Triton and TensorRT with high performance inference. Deploying deep learning models in production with high performance inference is challenging. The deployment software needs to be able to support multiple frameworks, such as …
March 2023
AI machine learning and deep learning inference is expected to grow faster than training. But the complexity that teams must manage to deploy, run, and scale models in production is enormous — multiple frameworks, evolving model architectures, volume of queries, diverse computing platforms, …
March 2023
Leveraging the right infrastructure for serving inference can provide faster spin-up times and responsive auto-scaling, which is critical to users’ satisfaction and the ultimate success of your model. By using NVIDIA Triton Inference Server with the FasterTransformer backend, you can see up to 40% …
March 2023
Machine learning (ML) models are increasingly used to make business-critical decisions at Uber, ranging from content discovery and recommendations to estimated time of arrivals (ETAs). In recent years, we've seen the adoption of deep learning and other innovations to further unlock the value of …