Polars

What Is Polars?

Polars is an open-source DataFrame library for data manipulation and analysis. It is implemented in Rust and uses Apache Arrow’s columnar memory format for efficient data processing. The library provides a structured and typed API, enabling users to perform a wide range of data transformations. Polars is designed to maximize computational efficiency and supports various file formats and data storage layers, making it compatible with modern workflows.

Why Use Polars

Polars is rising in popularity for processing medium- to large-sized tabular data efficiently on a single node. Key reasons for using Polars include:

  • Superior performance from the Rust-based implementation with parallel execution and SIMD instructions
  • Optimized query plans that minimize data movement and can process hundreds of GB of data on single node using lazy evaluation engine
  • Expressive API supporting complex operations and wide range of native data types
  • Handling datasets exceeding available RAM using out-of-core processing via streaming API 
  • Interoperability with other tools and libraries using the columnar memory format (Apache Arrow)

How Polars Works

Polars organizes data into a strict schema and processes it using either LazyFrame or eager execution. The query engine uses vectorized and SIMD (single instruction, multiple data) techniques to enhance computation. The query optimizer analyzes and adjusts query plans to improve execution efficiency. Polars supports data serialization with formats such as Parquet and leverages Apache Arrow for efficient data exchange. Its implementation in Rust enables parallel task execution and optimized memory usage.

Real-World Applications of Polars

Polars is used in various use cases requiring efficient data analysis, including:

  • ETL pipelines: Transforming and cleaning large datasets for analytics or machine learning
  • Real-time analytics: Processing streaming data with low latency
  • Big data workflows: Handling datasets that exceed system memory capacity
  • Financial analysis: Analyzing high-frequency trading data or performing complex calculations
  • Research: Managing structured and semi-structured data for experimental studies.
  • Data engineering: Developing workflows across storage systems like cloud platforms and databases

GPU-Accelerated Polars

GPUs are massively parallel processors with thousands of cores, designed for simultaneous task handling, in contrast to CPUs with fewer cores optimized for sequential processing.

The Difference between a CPU and GPU

NVIDIA RAPIDS™ is an open-source data analytics and machine learning acceleration platform that enables GPU parallelism for end-to-end data science pipelines. RAPIDS cuDF, a Python GPU DataFrame library built on Apache Arrow, is integrated with Polars, providing acceleration to Polars DataFrames on NVIDIA GPUs. With the integration, data scientists can run their Polars applications on GPUs with just a single function parameter.

The Polars query optimizer can take advantage of NVIDIA GPUs through the Polars GPU Engine, significantly enhancing performance for workloads involving operations like groupbys, joins, and string processing by up to 13X. If the application can’t be run on GPUs, the query optimizer will gracefully fall back to CPU execution, preserving compatibility while delivering the highest performance possible. 

NVIDIA will primarily maintain the GPU engine, with both NVIDIA RAPIDS and Polars teams collaborating to ensure smooth integration. To use NVIDIA GPUs with the Polars execution engine, data scientists can access the feature via a .collect(gpu=True) method and manage it as an optional function parameter. 

This advancement combines CPUs’ strength in sequential processing with GPUs’ efficiency in parallel processing, offering an optimal solution for large-scale data operations and deep learning tasks. As development progresses, more technical details and general availability will be announced, marking a significant step in expanding Polars’ capabilities for high-performance computing.

Next Steps

See how to get started with Polars

Get started with Polars and explore how GPU acceleration with NVIDIA RAPIDS can enhance your data science workflows.

Dive Deeper into GPU-Accelerated Polars

Learn more about accelerating Polars and getting started with the Polars GPU engine.

Explore the Tutorial Notebook

Get hands-on with our introductory tutorial on the Polars GPU engine, powered by cuDF.

Select Location
Middle East