Scalable, Accelerated Hardware-agnostic ML Inference with NVIDIA Triton and Arm NN
, Arcturus
, Arcturus
, Arm
, Artisight
NVIDIA Triton Inference Server simplifies the deployment of AI models at scale in production. With the extended Arm NN custom backend, we can orchestrate multiple ML model execution and enable optimized CPU/GPU/NPU inference configurations on embedded systems, including NVIDIA's Jetson family of devices or Raspberry Pi(s). We'll introduce the Triton Inference server Arm NN backend architecture and present accelerated embedded use cases enabled with it.