Python is a high-productivity dynamic programming language that’s widely used in data science. It’s extremely popular due to its clean and expressive syntax, standard data structures, comprehensive standard library, excellent documentation, broad ecosystem of libraries and tools, and large and open community. Perhaps most important, though, is the high productivity that a dynamically typed, interpreted language like Python enables.
But Python’s greatest strength can also be its greatest weakness. Its flexibility and typeless, high-level syntax can result in poor performance for data- and computation-intensive programs, as running native, compiled code is many times faster than running dynamic, interpreted code. For this reason, Python programmers concerned about efficiency often rewrite their innermost loops in C and call the compiled C functions from Python. There are a number of projects aimed at making this optimization easier, such as Cython, but they often require learning a new syntax. While Cython improves the performance significantly, it can require painstaking manual modification to Python code.
Numba was conceived as a much simpler alternative to Cython. One of its most appealing traits is that it doesn’t require learning a new syntax, replacing the Python interpreter, running a separate compilation step, or having a C/C++ compiler installed. Just applying the @jit Numba decorator to a Python function is enough. This enables compilation at run time (this is “Just-in-Time”, or JIT compilation). Numba’s ability to dynamically compile code means that you don’t give up the flexibility of Python. Also, numerical algorithms within Python programs that are compiled by Numba can approach the speeds of programs written in the compiled C or FORTRAN languages and run up to 100 times faster than the same procedures executed by the native Python interpreter. This is a huge step toward providing the ideal combination of high-productivity programming and high-performance computing.
Numba execution diagram
Numba is designed for array-oriented computing tasks, much like the widely used NumPy library. The data parallelism in array-oriented computing tasks is a natural fit for accelerators like GPUs. Numba understands NumPy array types, and uses them to generate efficient compiled code for execution on GPUs or multicore CPUs. The programming effort required can be as simple as adding a @vectorize function decorator to instruct Numba to generate a compiled, vectorized version of the function at run time. This lets it be used to process arrays of data in parallel on the GPU.
In addition to JIT compiling NumPy array code for the CPU or GPU, Numba exposes “CUDA Python”: the NVIDIA® CUDA® programming model for NVIDIA GPUs in Python syntax. By speeding up Python, its ability is extended from a glue language to a complete programming environment that can execute numeric code efficiently.
The combination of Numba with other tools in the Python data science ecosystem transforms the experience of GPU computing. The Jupyter Notebook provides a browser-based document creation environment that allows the combination of Markdown text, executable code, and graphical output of plots and images. Jupyter has become very popular for teaching, documenting scientific analyses, and interactive prototyping.
Numba has been tested on more than 200 different platform configurations. It runs on Windows, Apple Macintosh, and Linux operating systems on top of Intel and AMD x86, POWER8/9, and ARM CPUs, as well as both NVIDIA and AMD GPUs. Precompiled binaries are available for most systems.