All Industries

Perplexity Enhances Model Performance for AI-Powered Search Engines With NVIDIA NeMo

Perplexity

Objective

Perplexity aims to quickly customize frontier models to improve the accuracy and quality of search results and optimize them for lower latency and high throughput for a better user experience.

Customer

Perplexity

Use Case

Generative AI / LLMs

Products

NVIDIA NeMo

Navigating the Information Overload With Perplexity

Perplexity is a groundbreaking AI-powered answer engine that provides accurate, trusted, and real-time answers to any question.

Although the internet has provided access to a wealth of information and allows for countless questions to be posed annually, the conventional approach to searching for information requires users to sift through multiple sources to find and formulate the insights they need.

To address this, Perplexity created an "answer engine," offering a more efficient way to obtain information. When asked a question, Perplexity's answer engine delivers a concise answer directly, saving time and enhancing the user experience by providing direct and relevant information quickly.

Every search has a different intent, and Perplexity relies on a network of large language models (LLMs) to generate grounded results. To enable this, the Perplexity team needed tools that could easily and efficiently scale the model customization process with advanced tuning techniques.

Perplexity

Key Takeaways

Perplexity is disrupting traditional search engines with an "answer engine" that provides direct, concise answers, enhancing user experience by saving time and delivering accurate information quickly.

As of October 2024, the platform processes an impressive 340 million queries per month, and over 1,500 organizations have integrated Perplexity into their workflows, highlighting its value in professional settings.

Using NVIDIA NeMo™ for training, Perplexity has developed a family of proprietary online LLMs named Sonar to deliver current and factual responses.

Customization and Flexibility

Perplexity adopted NVIDIA NeMo, leveraging its reliability, flexibility, and ease of use to create custom models for their online answer engine. They utilized several data processing and advanced model alignment techniques supported by NeMo:

Supervised Fine-Tuning: NeMo's capabilities in handling data distributed across multiple nodes enabled Perplexity to scale their training processes efficiently.
Direct Preference Optimization (DPO): This allowed Perplexity to enhance the performance of pretrained models to align with human preferences, tailoring the models to users’ needs.
Proximal Policy Optimization (PPO): This alignment technique improved the outcomes of training models for complex tasks, such as playing games and controlling robots, with improved outcomes.

Within a few days of a new open-source release, the team had a new Sonar model that was 20% improved over the base model on search.

Perplexity has applied fine-tuning to frontier models, including Llama and Mistral model families, and is leveraging retrieval-augmented generation to deliver precise, concise answers based on the retrieved data. This level of customization has enabled Perplexity to achieve high accuracy and relevance in their AI applications.

Additionally, NeMo's ease of use, breath of supported model architectures, and high training throughput allowed Perplexity to quickly experiment and find the best-tuned models for their applications.

Scaling Fine-Tuning of LLMs With NeMo

NeMo allowed Perplexity to scale the fine-tuning of LLMs from 0.5B parameters to 400B+ parameters while taking advantage of its large-scale distributed data and model parallelism.

AI research engineer Weihua Hu has been leading an effort to improve Perplexityʼs retrieval capabilities and says, “NeMo allows Perplexity to quickly finetune a variety of open-source embedding models. This greatly enhanced our retrieval stack and led to a significant boost in answer quality.”

Weihua also noted, ”We were able to experiment with several post-training techniques and find the right mix of supervised fine-tuning (SFT) and direct preference optimization (DPO).”

By redefining how information is accessed, Perplexity aims to transform the way users interact with the web, making it more intuitive and user-friendly.

“NeMo allows Perplexity to quickly finetune a variety of open-source embedding models. This greatly enhanced our retrieval stack and led to a significant boost in answer quality.”

Weihua Hu,
AI Research Engineer

Ready to Get Started?

Discover how you can build, customize, and deploy generative AI with NVIDIA NeMo.

Get Started Learn More