All Industries
Perplexity aims to quickly customize frontier models to improve the accuracy and quality of search results and optimize them for lower latency and high throughput for a better user experience.
Perplexity
Generative AI / LLMs
NVIDIA NeMo
Perplexity is a groundbreaking AI-powered answer engine that provides accurate, trusted, and real-time answers to any question.
Although the internet has provided access to a wealth of information and allows for countless questions to be posed annually, the conventional approach to searching for information requires users to sift through multiple sources to find and formulate the insights they need.
To address this, Perplexity created an "answer engine," offering a more efficient way to obtain information. When asked a question, Perplexity's answer engine delivers a concise answer directly, saving time and enhancing the user experience by providing direct and relevant information quickly.
Every search has a different intent, and Perplexity relies on a network of large language models (LLMs) to generate grounded results. To enable this, the Perplexity team needed tools that could easily and efficiently scale the model customization process with advanced tuning techniques.
Key Takeaways
Perplexity adopted NVIDIA NeMo, leveraging its reliability, flexibility, and ease of use to create custom models for their online answer engine. They utilized several data processing and advanced model alignment techniques supported by NeMo:
Within a few days of a new open-source release, the team had a new Sonar model that was 20% improved over the base model on search.
Perplexity has applied fine-tuning to frontier models, including Llama and Mistral model families, and is leveraging retrieval-augmented generation to deliver precise, concise answers based on the retrieved data. This level of customization has enabled Perplexity to achieve high accuracy and relevance in their AI applications.
Additionally, NeMo's ease of use, breath of supported model architectures, and high training throughput allowed Perplexity to quickly experiment and find the best-tuned models for their applications.
NeMo allowed Perplexity to scale the fine-tuning of LLMs from 0.5B parameters to 400B+ parameters while taking advantage of its large-scale distributed data and model parallelism.
AI research engineer Weihua Hu has been leading an effort to improve Perplexityʼs retrieval capabilities and says, “NeMo allows Perplexity to quickly finetune a variety of open-source embedding models. This greatly enhanced our retrieval stack and led to a significant boost in answer quality.”
Weihua also noted, ”We were able to experiment with several post-training techniques and find the right mix of supervised fine-tuning (SFT) and direct preference optimization (DPO).”
By redefining how information is accessed, Perplexity aims to transform the way users interact with the web, making it more intuitive and user-friendly.
“NeMo allows Perplexity to quickly finetune a variety of open-source embedding models. This greatly enhanced our retrieval stack and led to a significant boost in answer quality.”
Weihua Hu,
AI Research Engineer