Baseten offers optimized inference infrastructure powered by NVIDIA’s hardware and software to help solve the challenges of deployment scalability, cost efficiency, and expertise.
With automatic scaling capabilities, Baseten allows customers deploying their models to dynamically adjust the number of replicas based on consumer traffic and service-level agreements, ensuring that capacity meets demand without manual intervention. This helps optimize for cost, as Baseten’s infrastructure can easily scale up or down depending on the number of requests coming to the model. Not only does it cost customers nothing when there's no activity, but once a request does come in, Baseten’s infrastructure, powered by NVIDIA GPUs on AWS EC2 instances powered by NVIDIA A100 Tensor Core GPUs, only takes 5–10 seconds to get the model up and running. This is an incredible speedup on cold starts, which previously took up to five minutes—a speedup of 30–60X. Customers can also choose from a variety of NVIDIA GPUs available on Baseten to accelerate their model inference, including but not limited to NVIDIA A100, A10G, T4, and V100 Tensor Core GPUs.
On top of NVIDIA hardware, Baseten leverages optimized NVIDIA software. Using the TensorRT-LLM feature of tensor parallelism served on AWS, Baseten boosted inference performance for a customer's LLM deployment by 2X through their open-source Truss framework. Truss is Baseten’s open-source packaging and deployment library, which lets users deploy models in production with ease.
TensorRT-LLM is included as a part of NVIDIA AI Enterprise, which provides a production-grade, secure, end-to-end software platform for enterprises building and deploying accelerated AI software.
NVIDIA’s full-stack AI inference approach plays a crucial role in meeting the stringent demands of Baseten’s customers’ real-time applications. With NVIDIA A100 GPUs andTensorRT-LLM optimizations, the underlying infrastructure unlocks both performance gains and cost savings for developers.
Explore more about Baseten by watching a quick demo of their product.