Breeno 机器人/NLP 场景中 GPU 推理加速的演进 Evolution Path of GPU Inference Accelerating on Breeno Bot/NLP Scenario
, Architect, OPPO
, Not Applicable, OPPO Inc.
OPPO, as one of the topmost smartphone manufacturers, launched its virtual voice assistant Breeno, a Bot same as Siri. Natural language processing (NLP) is a key technology for it. We have explored a series of ways to accelerate model inference on GPU on Bot/NLP scenario, such as TVM, ONNX Runtime, and TensorRT. We also designed a framework as our integrated inference engine for both NLP and Recommendation/Search/Ads scenario. In this session, our experience on GPU inference accelerating and inference engine framework will be shared.