Triton GTC Sessions

37 sessions
March 2022
, CTO and Co-founder, OctoML
Developers wanting to build full-stack intelligent applications powered by AI are creatively constrained by deficiencies in today’s MLOps workflows. The complexity is two-fold: (1) There isn’t a fast path for getting the latest deep learning innovations, such as transformer models, quickly integrated into
March 2022
, ML Lead, Falkonry
, CTO, Falkonry
Falkonry’s AI exploits the computational power of GPUs to provide real-time insights against trillions of high-speed data points. AI for defense and industrial operations are constrained by 2 factors: Speed & Scale. Using an orderly three-way brokerage between data, models, and GPU resources, supported
March 2022
, Senior Engineering Manager, Meta
, Meta
We'll cover end-to-end story starting with prototype building with PyTorch on GCP Vertex Workbench and going all the way to the production deployment on the Vertex AI Prediction with Nvidia Triton. The most important part of the talk: we won't have to rewrite code after having a Jupyter Notebook with
March 2022
, Developer Relations Manager, NVIDIA
, Program Manager, Azure Machine Learning, Microsoft
, Solutions Architect, NVIDIA
We'll discuss the Triton Model Analyzer tool and its role in generating the optimal model configuration that can be hosted on the AzureML. We'll cover optimizations of various hyperparameters, along with framework-specific optimizations that can be sweeped using Model Analyzer. The reports generated
March 2022
, Global Partner/Customer Partner Manager Financial Services, NVIDIA
, Senior Engagement Manager, Quantiphi
Auto insurers in the United States alone are losing almost $30 billion annually due to errors in damage estimation. The poor quality of vehicle images is a major contributing factor. Identifying grainy photo uploads in claims submission is a practical challenge. Failure to accurately identify noisy images may
March 2022
, Data Scientist, SVA System Vertrieb Alexander GmbH
, Data Scientist, SVA System Vertrieb Alexander GmbH
Learn about how we can use the possibilities of natural language processing for enterprise search systems using the unmatched abilities of transformers models and NVIDIA infrastructure for semantic search and question answering. We will make a deep dive into how Transformers using the power of
March 2022
, Staff Engineer, Alibaba Cloud
, Senior Engineer, Alibaba Cloud
, Staff Engineer, Alibaba Cloud
Triton Inference Server is a full-featured, extensible, and powerful inferencing solution on both the edge and cloud sides. When deploying Triton to production in the cloud, efficiency, scalability, and integration with infrastructure other than the server itself should be taken into consideration. We'll
March 2022
, Senior AI Developer Technology Engineer, NVIDIA
With GPT-3, it's impossible to run the inference of the entire model on a single GPU. We must extend to multiple GPUs, or even multiple node serving. We'll demonstrate how to integrate the FasterTransformer, which is highly optimized and flexible transformer library, with Triton inference
March 2022
, Manager of Speech Team, LINE Corp.
, Senior Solution Architect, NVIDIA
LINE has developed a fast and accurate speech recognition service using a state-of-the-art streaming end-to-end (E2E) model. Our service is required to return recognition results with low latency for each speech request posted by many clients. However, it's hard to satisfy that requirement using the E2E
March 2022
, Product Manager, NVIDIA
, Technical Marketing Engineer, NVIDIA
, Deep Learning Solutions Architect, NVIDIA
, Product Manager, NVIDIA
, Group Product Manager, NVIDIA
, PRINCIPAL ENGINEER, NVIDIA
, TensorRT Engineering Manager, NVIDIA
In this session we focus on optimized deep learning inference and deployment of AI models in production. We'll discuss how the NVIDIA TensorRT, an SDK for high-performance deep learning inference, can deliver low latency and high throughput for deep learning applications. We'll then talk
March 2022
, AI Developer Technology Engineer, NVIDIA
Merlin HugeCTR is a recommender system-specific framework which accelerates the training and deployment of complex deep learning models on GPUs at scale. Since its public release in early 2020, we've added a lot of enhancements to performance and usability. We'll introduce some of them,
March 2022
, Senior Data Scientist, NVIDIA
, Software Engineer, Machine Learning Infrastructure, LinkedIn
, Software Engineer, Machine Learning Infrastructure, LinkedIn
, Staff Software Engineer, LinkedIn
Learn how LinkedIn's model serving has evolved to support Deep Learning with TensorFlow, how we accelerated inference with GPUs and what we experimented with ONNX and Triton.
March 2022
, Solutions Architecture , NVIDIA
, Solutions Architect, Machine Learning, Google Cloud
, Software Engineer, Vertex AI Prediction , Google Cloud
NVIDIA Triton Inference Server (Triton) is an open-source inference serving software that maximizes performance and simplifies model deployment at scale. Triton supports multiple frameworks (TensorRT, TensorFlow, ONNX, PyTorch, and more) with custom CUDA and Python backends on GPU-/CPU-based
March 2022
, Product Manager, NVIDIA
, Product Marketing Manager, NVIDIA
We'll go over NVIDIA Triton Inference Server software and what's new. Triton is an open-source inference-serving software for fast and scalable AI in applications. Learn how Triton helps deploy models from all popular frameworks — including TensorFlow, PyTorch, ONNX, TensorRT, RAPIDS FIL (for
March 2022
, Senior Software Development Engineer, NVIDIA
, Software development engineer, NVIDIA
, Senior Software Development Engineer, NVIDIA
, Senior CUDA Image Processing Engineer, NVIDIA
, Senior Deep Learning Software Engineer, NVIDIA
, Software development manager in Math Libraries team, NVIDIA
, Software engineer in the CUDA Image Processing Libraries team, NVIDIA
, Senior Software Development Engineer, NVIDIA
, Deep Learning Manger, NVIDIA
, Senior Software Development Engineer, NVIDIA
, Senior CUDA math library engineer and team lead, NVIDIA
Learn about the latest optimizations in NVIDIA's image/signal processing libraries like NPP, nvJPEG, and DALI — a fast, flexible data loading and augmentation library. We'll discuss how to use various data processing solutions spanning low-level image and signal processing primitives in NPP, image
March 2022
, Assistant Chief Information Technology Officer, CP All Public Company Limited (Thailand)
, Technology Manager, Gosoft (Thailand) Co., Ltd.
, Robot Engineer, Gosoft (Thailand) Co., Ltd.
, Principal Data Scientist, Gosoft (Thailand) Co., Ltd.
, Deputy General Manager, CP All Public Company Limited (Thailand)
, Head of Asia Pacific South (APS) Regional DevRel, NVIDIA
Learn how the search engine works in online retail shopping. This overview introduces the AI tech that improves the retail business in the Thai local, and how we used Thai ASR and NVIDIA Conversational AI Platform to implement the voice search function. Users just do voice input without typing, and the
March 2022
, VP - Personalization and Recommendations, Walmart Global Tech
I'll outline some practical lessons and challenges learned from training and deploying AI-powered learning systems that power personalization and recommendations for the world’s largest omni-channel retailer. How do we handle petabytes of omni-channel data to train recommender systems,
March 2022
, Technical Marketing Engineer, NVIDIA
, Senior Technical Marketing Engineer, NVIDIA
This session continues the technical deep dive and will cover the various technological components for scaling training and inferencing workloads using Kubernetes, Horovod, and Triton Inference Server. Learn the role of NVIDIA’s GPU and network operators, as well as the open-source components such
March 2022
, Product Architect, NVIDIA
, CTO, Run:AI
AI and machine learning operations (MLOps) engineers often struggle to deploy models on GPUs. AI models bring new requirements and challenges that aren't always aligned with traditional best practices for application deployment, while orchestrating GPUs in production brings additional new
March 2022
, Not Applicable, Aetina Corporation
Edge computing is transforming the way to our cities, every space, and every daily workflow thanks to AI-enabled intelligent applications. Application developers and system integrators leverage NVIDIA AI and GPU solutions to accelerate the forward momentum. Aetina will share Pro-AI services — how
March 2022
, Senior AI Developer Technology Engineer, NVIDIA
, Principal Researcher, Microsoft
Giant transformer models with billions of parameters are achieving state-of-the-art results on various natural language processing tasks. Such large models require computation linear in the number of parameters. To combat this problem, Mixture of Experts (MoE) introduces an architecture where the
March 2022
, Senior Deep Learning Solution Architect, NVIDIA
, Senior Deep Learning Data Scientist, NVIDIA
Big data and data science applications are central to a wide range of business operations, and are at heart of countless products and services. Most of the utilities in this space, including scikit-learn, Pandas, NumPy, NetworkX, and Spark, have GPU-accelerated drop-in replacements (for example,
March 2022
, Data Scientist, Healthcare, NVIDIA
Performance is a critical component of a successful deep learning application in health care. Biopharmaceutical innovation relies on high throughput and low latency to meet strict, and in some cases real-time, performance requirements while rapidly searching for signal in vast amounts of noise
March 2022
, Developer Technology Engineer, NVIDIA
, PRINCIPAL ENGINEER, NVIDIA
, Developer Technology Engineer, NVIDIA
, Senior Developer Technology Engineer, NVIDIA
, Developer Technology Engineer, NVIDIA
, Software Engineer, NVIDIA
, Engineer, QA, NVIDIA
Our experts are highly experienced with moving AI Inference models from research to production environments and are happy to share these experiences, tools, and techniques with you, including topics such as: - Moving from research to production - Minimizing device memory usage -
March 2022
, Software development engineer, NVIDIA
In the “Introduction to NVIDIA DALI: GPU-Accelerated Data Preprocessing“ session you learned DALI basics. In this session, you'll gain practical knowledge on how to use it effectively to accelerate a real-life deep-learning application. We'll do a use case study for a particular framework and a particular
March 2022
, Senior Software Development Engineer, NVIDIA
As GPUs’ computational power increases, providing deep-learning models with data as fast as it's consumed gets more difficult. Data preprocessing, typically executed on the CPU, becomes the bottleneck of the system. A solution to this problem is to offload data preprocessing to the GPU.
March 2022
, Product Manager, NVIDIA
Jetson software is built to not only accelerate AI applications end-to-end, but also to accelerate time to market. Learn how we bring NVIDIA technologies to the edge on Jetson for building accelerated AI applications. We'll cover the entire Jetson software stack and NVIDIA TAO (Train-Adapt-Optimize), Riva,
November 2021
, NVIDIA
, NVIDIA
Setting up an enterprise-grade AI software infrastructure is a complicated task for IT admins. To address this, VMware and NVIDIA have partnered together to deliver the AI-Ready Enterprise platform. Learn how enterprises can benefit from NVIDIA AI Enterprise certified on vSphere. We'll guide you through
November 2021
, OpenAI
Traditional single instruction, multiple threads (SIMT) programming with CUDA, for all its benefits, can be daunting to machine learning researchers in need of fast custom kernels. We'll shed light on alternative programming models capable of improving GPU programmability without too much of an
November 2021
, NVIDIA
, NVIDIA
Learn what’s new in NVIDIA Triton Inference Server. NVIDIA Triton is an open-source inference serving software that simplifies the deployment of AI models at scale in production. Deploy deep learning and machine learning models from any framework (TensorFlow, NVIDIA TensorRT, PyTorch, OpenVINO,
November 2021
, Arcturus
, Arcturus
, Arm
, Artisight
NVIDIA Triton Inference Server simplifies the deployment of AI models at scale in production. With the extended Arm NN custom backend, we can orchestrate multiple ML model execution and enable optimized CPU/GPU/NPU inference configurations on embedded systems, including NVIDIA's Jetson
November 2021
, Hugging Face
Learn how Hugging Face achieved 1 millisecond Transformers inference for customers of its new Infinity solution. Transformers models conquered natural language processing with breakthrough accuracy. But these large models and complex architectures are too challenging for most companies to put in
September 2022
, Developer Advocate Engineer for Deep Learning SW, NVIDIA
, Product Marketing Manager, NVIDIA
Join NVIDIA's Triton product team to discuss highly performant and efficient inference using new functionalities in NVIDIA Triton Inference Server. Triton is an open-source inference serving software that helps standardize and streamline AI inference by enabling teams to deploy, run, and scale trained AI
September 2022
, Group Product Manager, NVIDIA
Learn how to easily optimize and deploy every model with Triton and TensorRT with high performance inference. Deploying deep learning models in production with high performance inference is challenging. The deployment software needs to be able to support multiple frameworks, such as
March 2023
, Product Marketing Manager, NVIDIA
AI machine learning and deep learning inference is expected to grow faster than training. But the complexity that teams must manage to deploy, run, and scale models in production is enormous — multiple frameworks, evolving model architectures, volume of queries, diverse computing platforms,
March 2023
, Product Marketing Manager, NVIDIA
, CTO, CoreWeave
Leveraging the right infrastructure for serving inference can provide faster spin-up times and responsive auto-scaling, which is critical to users’ satisfaction and the ultimate success of your model. By using NVIDIA Triton Inference Server with the FasterTransformer backend, you can see up to 40%
March 2023
, Principal Software Engineer, Uber
, Engineering Manager, Uber
Machine learning (ML) models are increasingly used to make business-critical decisions at Uber, ranging from content discovery and recommendations to estimated time of arrivals (ETAs). In recent years, we've seen the adoption of deep learning and other innovations to further unlock the value of