Numba: Unleashing the Power of Python for High-Performance Computing

Programming
August 1, 2023

Elias Owis

Software Engineer

Introduction:

Python, with its user-friendly syntax and extensive libraries, has emerged as a versatile and widely-used programming language across various domains. However, its interpretive nature often leads to performance bottlenecks, especially when dealing with computationally intensive tasks. Traditionally, developers have turned to languages like C++, C#, Rust and JavaScript for improved execution speed. In this article, we explore Numba, a game-changing library that enables Python to compete with these lower-level languages by harnessing the power of just-in-time (JIT) compilation. We will delve into Numba’s features, provide a comprehensive comparison of Python with Numba against other languages, explore additional examples showcasing Numba’s capabilities, and discuss when and where to effectively leverage Numba’s capabilities.

The Power of Numba:

Numba, an open-source project backed by Anaconda, has revolutionized Python’s performance landscape by providing a JIT compiler that translates Python code into optimized machine code. Unlike traditional Python interpreters, Numba compiles Python functions on-the-fly, yielding remarkable speed-ups by leveraging the Low-Level Virtual Machine (LLVM) infrastructure. The result is highly efficient native machine code that rivals the performance of compiled languages like C++.

Code Example — A Complicated Algorithm:

Let’s create a code example for a complicated algorithm that performs a brute-force search to count all prime numbers within a given range. Brute-force searching for prime numbers can be computationally expensive, especially for larger ranges. We’ll implement this algorithm in C++, C#, JavaScript, Rust, Python without Numba, and Python with Numba. We will compare the performance and execution time of these implementations.

Prime Number Algorithm Explanation:

A prime number is a natural number greater than 1 that has no positive divisors other than 1 and itself. Brute-force searching involves checking each number within the given range to determine if it is prime. We’ll use a simple function to check for prime numbers.

To evaluate the performance of the implementations, we’ll run the algorithms with a large list of numbers and measure the execution time.

(I will set the execution time on my laptop).

C++ Implementation:

#include <iostream>
#include <vector>
#include <ctime>

bool is_prime(int num)
{
    if (num <= 1)
        return false;
    for (int i = 2; i * i <= num; ++i)
    {
        if (num % i == 0)
            return false;
    }
    return true;
}

int find_primes(int start, int end)
{
    int count = 0;
    for (int num = start; num <= end; ++num)
    {
        if (is_prime(num))
        {
            count++;
        }
    }
    return count;
}

int main()
{
    int start = 0;
    int end = 10000000;
    std::vector<int> primes;
    // Find primes and measure execution time
    clock_t start_time = clock();
    int primes_count = find_primes(start, end);
    clock_t end_time = clock();
    double execution_time = static_cast<double>(end_time - start_time) / CLOCKS_PER_SEC;
    std::cout << "Execution time: " << execution_time << " seconds" << std::endl;
    std::cout << "Total prime numbers found: " << primes.size() << std::endl;
    return 0;
}

C++ Execution Time: 8.9 seconds

C# Implementation:

using System;
using System.Collections.Generic;
using System.Diagnostics;

public class Program
{
    public static bool IsPrime(int num)
    {
        if (num <= 1) return false;
        for (int i = 2; i * i <= num; ++i)
        {
            if (num % i == 0) return false;
        }
        return true;
    }

    public static int FindPrimes(int start, int end)
    {
        int count = 0;
        for (int num = start; num <= end; ++num)
        {
            if (IsPrime(num))
            {
                count++;
            }
        }
        return count;
    }

    public static void Main()
    {
        int start = 0;
        int end = 10000000;
        // Find primes and measure execution time
        Stopwatch stopwatch = new Stopwatch();
        stopwatch.Start();
        int primes_count = FindPrimes(start, end);
        stopwatch.Stop();
        double executionTime = stopwatch.Elapsed.TotalSeconds;
        Console.WriteLine("Execution time: " + executionTime + " seconds");
        Console.WriteLine("Total prime numbers found: " + primes_count);
    }
}

C# Execution Time: 9.0 seconds

Rust Implementation:

use std::time::Instant;

fn is_prime(num: i32) -> bool {
    if num <= 1 {
        return false;
    }
    for i in 2..=((num as f64).sqrt() as i32) {
        if num % i == 0 {
            return false;
        }
    }
    true
}

fn find_primes(start: i32, end: i32) -> i32 {
    let mut count = 0;
    for num in start..=end {
        if is_prime(num) {
            count += 1;
        }
    }
    count
}

fn main() {
    let start = 0;
    let end = 10000000;
    // Find primes and measure execution time
    let start_time = Instant::now();
    let primes_count = find_primes(start, end);
    let end_time = Instant::now();
    let execution_time = end_time.duration_since(start_time).as_secs_f64();
    println!("Execution time: {} seconds", execution_time);
    println!("Total prime numbers found: {}", primes_count);
}

Rust Execution Time: 16.2 seconds

JavaScript Implementation:

function isPrime(num) {
  if (num <= 1) return false;
  for (let i = 2; i * i <= num; ++i) {
    if (num % i === 0) return false;
  }
  return true;
}

function findPrimes(start, end) {
  let count = 0;
  for (let num = start; num <= end; ++num) {
    if (isPrime(num)) {
      count++;
    }
  }
  return count;
}

function main() {
  const start = 0;
  const end = 10000000;
  // Find primes and measure execution time
  const startTime = new Date();
  const primes_count = findPrimes(start, end);
  const endTime = new Date();
  const executionTime = (endTime - startTime) / 1000;
  console.log("Execution time:", executionTime, "seconds");
  console.log("Total prime numbers found:", primes_count);
}

main();

JS Execution Time: 8.9 seconds

Python Implementation (Without Numba):

import time

def is_prime(num):
    if num <= 1:
        return False
    for i in range(2, int(num**0.5) + 1):
        if num % i == 0:
            return False
    return True

def find_primes(start, end):
    count = 0
    for num in range(start, end + 1):
        if is_prime(num):
            count += 1
    return count

def main():
    start = 0
    end = 10000000
    # Find primes and measure execution time
    start_time = time.time()
    primes_count = find_primes(start, end)
    end_time = time.time()
    execution_time = end_time - start_time
    print("Execution time:", execution_time, "seconds")
    print("Total prime numbers found:", primes_count)

main()

Python Execution Time: 101.9 seconds (too slow)

Python Implementation (With Numba):

import time
import numba

@numba.jit
def is_prime_numba(num):
    if num <= 1:
        return False
    for i in range(2, int(num**0.5) + 1):
        if num % i == 0:
            return False
    return True

@numba.njit(fastmath=True, cache=True, parallel=True)
def find_primes_numba(start, end):
    # return [num for num in numba.prange(start, end + 1) if is_prime_numba(num)]
    
    count = 0
    for num in numba.prange(start, end + 1):
        if is_prime_numba(num):
            count += 1
    return count

def main():
    start = 0
    end = 10000000
    # Find primes and measure execution time
    start_time = time.time()
    primes_count = find_primes_numba(start, end)
    end_time = time.time()
    execution_time = end_time - start_time
    print("Execution time (with Numba):", execution_time, "seconds")
    print("Total prime numbers found:", primes_count)

main()

Python with Numba Execution Time: 2.3 seconds (the fastest)

Results and Explanation:

After running the provided code, we observed that Python with Numba outperformed the C++ implementation in terms of execution time for finding prime numbers count within the range of 0 to 10,000,000. This result might seem surprising at first, as traditionally C++ is known for its superior performance compared to Python due to its nature as a compiled language. However, with the help of Numba’s just-in-time (JIT) compilation and parallel processing features, Python code can achieve significant speedups.

Numba’s @numba.jit decorator and @numba.njit(parallel=True) option enable efficient compilation and parallel execution of the code, respectively. The combination of Numba’s capabilities allows the Python code to be heavily optimized for numerical computations and computationally intensive tasks such as prime number searching.

During the execution, Numba effectively translates the Python code into optimized machine code, reducing the overhead associated with Python’s interpreter and improving the code’s performance. Additionally, the use of parallel processing with Numba’s numba.prange function allows the code to leverage multiple CPU cores, maximizing computational power.

As a result, Python with Numba surpasses the performance of the C++ implementation, showcasing how Numba can elevate Python’s capabilities for numerical computations and computationally demanding algorithms. This combination of simplicity and performance makes Python with Numba an excellent choice for tasks that require both speed and ease of development. It allows developers to write high-level Python code while achieving performance that was traditionally associated with low-level languages like C++.

Numba Use Cases:

Numba excels in scenarios where performance is critical, and numerical computations, simulations, and scientific calculations form a significant part of the workload. It shines in the following use cases:

Scientific Computing:

Numba enhances the performance of complex scientific algorithms, simulations, and data analysis tasks, providing a significant boost to researchers and scientists.

Example — Numerical Integration (Trapezoidal Rule) Explanation:
The trapezoidal rule is a numerical integration method used to approximate the definite integral of a function. It divides the area under the curve of the function into trapezoids and sums up their areas to approximate the integral.

import time
import numba

def f(x):
    # The function to be integrated
    return x**2

def numerical_integration_without_numba(f, a, b, n):
    h = (b - a) / n
    integral = (f(a) + f(b)) / 2.0
    for i in range(1, n):
        x = a + i * h
        integral += f(x)
    integral *= h
    return integral


@numba.jit
def g(x):
    # The function to be integrated
    return x**2

@numba.jit
def numerical_integration_with_numba(f, a, b, n):
    h = (b - a) / n
    integral = (f(a) + f(b)) / 2.0
    for i in range(1, n):
        x = a + i * h
        integral += f(x)
    integral *= h
    return integral

def main():
    a = 0.0  # Lower limit of integration
    b = 1.0  # Upper limit of integration
    n = 10000000  # Number of trapezoids
    
    # Without Numba
    start_time = time.time()
    result_without_numba = numerical_integration_without_numba(f, a, b, n)
    end_time = time.time()
    execution_time_without_numba = end_time - start_time

    print("Numerical Integration without Numba:")
    print("Result:", result_without_numba)
    print("Execution time:", execution_time_without_numba, "seconds")

    # With Numba
    start_time = time.time()
    result_with_numba = numerical_integration_with_numba(g, a, b, n)
    end_time = time.time()
    execution_time_with_numba = end_time - start_time

    print("Numerical Integration with Numba:")
    print("Result:", result_with_numba)
    print("Execution time:", execution_time_with_numba, "seconds")

if __name__ == "__main__":
    main()

Execution Time:

Without Numba: 2.3 seconds

With Numba: 0.3 seconds

Machine Learning and Data Science:

Numba can accelerate various machine learning algorithms, particularly those involving array computations and linear algebra operations, leading to faster model training and predictions.

Example — Linear Regression: Linear regression is a popular supervised learning algorithm used for predicting a continuous target variable based on one or more predictor variables. In this example, we’ll perform simple linear regression with one predictor variable.

import numpy as np
import time
import numba


def linear_regression_without_numba(X, y):
    n = len(X)
    X_mean = np.mean(X)
    y_mean = np.mean(y)

    numerator = 0.0
    denominator = 0.0

    for i in range(n):
        numerator += (X[i] - X_mean) * (y[i] - y_mean)
        denominator += (X[i] - X_mean) ** 2

    slope = numerator / denominator
    intercept = y_mean - slope * X_mean
    return slope, intercept


@numba.jit
def linear_regression_with_numba(X, y):
    n = len(X)
    X_mean = np.mean(X)
    y_mean = np.mean(y)

    numerator = 0.0
    denominator = 0.0

    for i in range(n):
        numerator += (X[i] - X_mean) * (y[i] - y_mean)
        denominator += (X[i] - X_mean) ** 2

    slope = numerator / denominator
    intercept = y_mean - slope * X_mean
    return slope, intercept

def main():
    # Generate a large dataset
    np.random.seed(0)
    X = np.random.rand(10000000)  # Predictor variable
    y = 2 * X + 3 + np.random.randn(10000000)  # Target variable (with some noise)

    # Without Numba
    start_time = time.time()
    slope, intercept = linear_regression_without_numba(X, y)
    end_time = time.time()
    execution_time_without_numba = end_time - start_time

    print("Linear Regression without Numba:")
    print("Slope:", slope)
    print("Intercept:", intercept)
    print("Execution time:", execution_time_without_numba, "seconds")

    # With Numba
    start_time = time.time()
    slope, intercept = linear_regression_with_numba(X, y)
    end_time = time.time()
    execution_time_with_numba = end_time - start_time

    print("Linear Regression with Numba:")
    print("Slope:", slope)
    print("Intercept:", intercept)
    print("Execution time:", execution_time_with_numba, "seconds")

if __name__ == "__main__":
    main()

Execution time:

Without Numba: 7.7 seconds

With Numba: 0.5 seconds

Computational Physics and Engineering:

Numba proves invaluable for simulations and solving differential equations, enabling engineers and physicists to achieve results efficiently.

Example — Simulation of Particle Motion with Constant Force Explanation: In this example, we’ll simulate the motion of a particle moving under the influence of a constant force. We’ll use the equations of motion to update the particle’s position and velocity over time.

import time
import numba


def simulate_particle_motion_without_numba(mass, initial_position, initial_velocity, constant_force, time_step, num_steps):
    position = initial_position
    velocity = initial_velocity

    for _ in range(num_steps):
        acceleration = constant_force / mass
        velocity += acceleration * time_step
        position += velocity * time_step

    return position


@numba.jit
def simulate_particle_motion_with_numba(mass, initial_position, initial_velocity, constant_force, time_step, num_steps):
    position = initial_position
    velocity = initial_velocity

    for _ in range(num_steps):
        acceleration = constant_force / mass
        velocity += acceleration * time_step
        position += velocity * time_step

    return position


def main():
    # Particle parameters
    mass = 1.0
    initial_position = 0.0
    initial_velocity = 0.0
    constant_force = 10.0

    # Simulation parameters
    time_step = 0.01
    num_steps = 10000000

    # Without Numba
    start_time = time.time()
    final_position_without_numba = simulate_particle_motion_without_numba(mass, initial_position, initial_velocity, constant_force, time_step, num_steps)
    end_time = time.time()
    execution_time_without_numba = end_time - start_time

    print("Simulation without Numba:")
    print("Final Position:", final_position_without_numba)
    print("Execution time:", execution_time_without_numba, "seconds")

    # With Numba
    start_time = time.time()
    final_position_with_numba = simulate_particle_motion_with_numba(mass, initial_position, initial_velocity, constant_force, time_step, num_steps)
    end_time = time.time()
    execution_time_with_numba = end_time - start_time

    print("Simulation with Numba:")
    print("Final Position:", final_position_with_numba)
    print("Execution time:", execution_time_with_numba, "seconds")


if __name__ == "__main__":
    main()

Execution time:

Without Numba: 0.8 seconds

With Numba: 0.2 seconds

Financial Modeling:

Numba can be employed to optimize financial calculations, such as option pricing, portfolio optimization, and risk analysis, facilitating real-time decision-making.

Example — Option Pricing with Monte Carlo Simulation Explanation: Monte Carlo simulation is a widely used technique for option pricing in finance. It involves simulating the future stock price using random walks and then calculating the option payoff based on the simulated stock prices.

import numpy as np
import time
import numba


def option_pricing_without_numba(S0, K, r, sigma, T, num_simulations, num_steps):
    dt = T / num_steps
    total_payoff = 0.0

    for _ in range(num_simulations):
        S = S0
        for _ in range(num_steps):
            epsilon = np.random.normal(0.0, 1.0)
            S *= np.exp((r - 0.5 * sigma**2) * dt + sigma * np.sqrt(dt) * epsilon)

        total_payoff += max(S - K, 0)

    option_price = total_payoff / num_simulations
    return option_price


@numba.jit
def option_pricing_with_numba(S0, K, r, sigma, T, num_simulations, num_steps):
    dt = T / num_steps
    total_payoff = 0.0

    for _ in range(num_simulations):
        S = S0
        for _ in range(num_steps):
            epsilon = np.random.normal(0.0, 1.0)
            S *= np.exp((r - 0.5 * sigma**2) * dt + sigma * np.sqrt(dt) * epsilon)

        total_payoff += max(S - K, 0)

    option_price = total_payoff / num_simulations
    return option_price


def main():
    # Option parameters
    S0 = 100.0  # Initial stock price
    K = 100.0   # Strike price
    r = 0.05    # Risk-free interest rate
    sigma = 0.2 # Volatility (standard deviation of returns)
    T = 1.0     # Time to expiration (in years)

    # Monte Carlo simulation parameters
    num_simulations = 100000  # Number of simulations
    num_steps = 252           # Number of steps (days) for each simulation
    
    # Without Numba
    start_time = time.time()
    option_price_without_numba = option_pricing_without_numba(S0, K, r, sigma, T, num_simulations, num_steps)
    end_time = time.time()
    execution_time_without_numba = end_time - start_time

    print("Option Pricing without Numba:")
    print("Option Price:", option_price_without_numba)
    print("Execution time:", execution_time_without_numba, "seconds")

    # With Numba
    start_time = time.time()
    option_price_with_numba = option_pricing_with_numba(S0, K, r, sigma, T, num_simulations, num_steps)
    end_time = time.time()
    execution_time_with_numba = end_time - start_time

    print("Option Pricing with Numba:")
    print("Option Price:", option_price_with_numba)
    print("Execution time:", execution_time_with_numba, "seconds")

if __name__ == "__main__":
    main()

Execution time:

Without Numba: 78.3 seconds

With Numba: 1.4 seconds

Parallelization:

As demonstrated in the additional example, Numba’s support for parallel processing allows developers to fully utilize multicore processors and tackle large-scale parallel computations efficiently.

Example — Matrix Multiplication with Parallelization Explanation: Matrix multiplication is a computationally intensive task that can benefit from parallelization. We’ll use Numba’s numba.prange function to parallelize the nested loops for matrix multiplication, taking advantage of multiple CPU cores.

import numpy as np
import time
import numba


def matrix_multiply_without_numba(A, B):
    m, n, p = A.shape[0], A.shape[1], B.shape[1]
    result = np.zeros((m, p), dtype=np.float64)

    for i in range(m):
        for j in range(p):
            for k in range(n):
                result[i, j] += A[i, k] * B[k, j]

    return result


@numba.njit(parallel=True)
def matrix_multiply_with_numba(A, B):
    m, n, p = A.shape[0], A.shape[1], B.shape[1]
    result = np.zeros((m, p), dtype=np.float64)

    for i in numba.prange(m):
        for j in range(p):
            for k in range(n):
                result[i, j] += A[i, k] * B[k, j]

    return result

def main():
    # Generate large random matrices
    size = 200
    A = np.random.rand(size, size)
    B = np.random.rand(size, size)
    
    # Without Numba
    start_time = time.time()
    result_without_numba = matrix_multiply_without_numba(A, B)
    end_time = time.time()
    execution_time_without_numba = end_time - start_time

    print("Matrix Multiplication without Numba:")
    print("Execution time:", execution_time_without_numba, "seconds")

    # With Numba Parallelization
    start_time = time.time()
    result_with_numba = matrix_multiply_with_numba(A, B)
    end_time = time.time()
    execution_time_with_numba = end_time - start_time

    print("Matrix Multiplication with Numba Parallelization:")
    print("Execution time:", execution_time_with_numba, "seconds")

if __name__ == "__main__":
    main()

Execution time:

Without Numba: 4.5 seconds

With Numba: 0.9 seconds

In all the test cases, you will observe a noticeable advantage in using Numba when dealing with large datasets. The functions optimized with Numba consistently outperform the Python implementations without Numba. As the data size increases, the benefit of using Numba becomes even more pronounced, resulting in significant performance improvements. Numba proves to be a valuable asset in scenarios where enhanced execution speed is crucial, such as scientific computing, machine learning, computational physics, financial modeling, and parallel processing. Its ability to harness the power of just-in-time compilation and parallel processing enables developers to achieve remarkable performance gains, especially when dealing with extensive and computationally intensive tasks. As the data scales up, Numba’s impact on speeding up operations becomes increasingly evident, making it an indispensable tool for data-driven applications.

GitHub Repositories for Code Comparison and Numba Use Cases:

If you’re interested in exploring the code comparisons between Python with Numba and other programming languages or delving deeper into various Numba use cases, you can find the relevant code and examples in the following GitHub repositories:

Python-Numba-vs-Other-Languages:
https://github.com/Eng-Elias/Python-Numba-vs-Other-Languages
Numba-Use-Cases:
https://github.com/Eng-Elias/Numba-Use-Cases

Feel free to explore and contribute these repositories, fork them, and experiment with the code to gain insights into the potential of Numba for accelerating your own Python projects. Whether you’re a data scientist, software engineer, or programming enthusiast, these repositories aim to offer valuable resources for harnessing Numba’s speed and efficiency in your computational endeavors.

By sharing code comparisons and practical use cases, we hope to encourage and inspire the adoption of Numba in diverse fields, enabling developers to unlock the full potential of Python as a high-performance language.

Happy coding and optimizing!

Conclusion:

Numba has undoubtedly proven to be a game-changer for Python developers seeking enhanced performance in computationally intensive tasks. By leveraging Numba’s JIT compilation capabilities, Python can compete with traditionally faster languages like C++, C#, Rust and JavaScript. However, it’s essential to consider the nature of the task at hand when deciding whether to use Numba or not. For numerical computations, simulations, scientific calculations, and algorithms that can benefit from parallelization, Numba can be a valuable addition to the Python developer’s toolbox. When performance is a critical factor, Numba empowers Python developers to achieve optimal execution speeds without sacrificing Python’s simplicity and expressiveness.

Resources:

Numba: A High Performance Python Compiler (pydata.org)

https://github.com/Eng-Elias/Python-Numba-vs-Other-Languages

https://github.com/Eng-Elias/Numba-Use-Cases

If you liked this content, please share it.

Numba: Unleashing the Power of Python for High-Performance Computing

Introduction:

Code Example — A Complicated Algorithm:

Prime Number Algorithm Explanation:

Results and Explanation:

Numba Use Cases:

Scientific Computing:

Machine Learning and Data Science:

Computational Physics and Engineering:

Financial Modeling:

Parallelization:

GitHub Repositories for Code Comparison and Numba Use Cases:

Conclusion:

Resources:

Related Posts

Leave a Comment Cancel Reply

Recent Posts

Blog Sections

Table of Contents