From Zero to Millions: Scaling Large Language Model Inference With TensorRT-LLM
, Head of AI Inference, Perplexity AI
We'll give an overview of how we successfully utilized TensorRT-LLM to deploy large language models at scale, thereby supporting millions of users at Perplexity.