Numba: Unleashing the Power of Python for High-Performance Computing

Elias Owis

Software Engineer

Introduction:

Python, with its user-friendly syntax and extensive libraries, has emerged as a versatile and widely-used programming language across various domains. However, its interpretive nature often leads to performance bottlenecks, especially when dealing with computationally intensive tasks. Traditionally, developers have turned to languages like C++, C#, Rust and JavaScript for improved execution speed. In this article, we explore Numba, a game-changing library that enables Python to compete with these lower-level languages by harnessing the power of just-in-time (JIT) compilation. We will delve into Numba’s features, provide a comprehensive comparison of Python with Numba against other languages, explore additional examples showcasing Numba’s capabilities, and discuss when and where to effectively leverage Numba’s capabilities.

Numba, an open-source project backed by Anaconda, has revolutionized Python’s performance landscape by providing a JIT compiler that translates Python code into optimized machine code. Unlike traditional Python interpreters, Numba compiles Python functions on-the-fly, yielding remarkable speed-ups by leveraging the Low-Level Virtual Machine (LLVM) infrastructure. The result is highly efficient native machine code that rivals the performance of compiled languages like C++.

Code Example — A Complicated Algorithm:

Let’s create a code example for a complicated algorithm that performs a brute-force search to count all prime numbers within a given range. Brute-force searching for prime numbers can be computationally expensive, especially for larger ranges. We’ll implement this algorithm in C++, C#, JavaScript, Rust, Python without Numba, and Python with Numba. We will compare the performance and execution time of these implementations.

Prime Number Algorithm Explanation:

A prime number is a natural number greater than 1 that has no positive divisors other than 1 and itself. Brute-force searching involves checking each number within the given range to determine if it is prime. We’ll use a simple function to check for prime numbers.

#include <iostream>
#include <vector>
#include <ctime>

bool is_prime(int num)
{
if (num <= 1)
return false;
for (int i = 2; i * i <= num; ++i)
{
if (num % i == 0)
return false;
}
return true;
}

int find_primes(int start, int end)
{
int count = 0;
for (int num = start; num <= end; ++num)
{
if (is_prime(num))
{
count++;
}
}
return count;
}

int main()
{
int start = 0;
int end = 10000000;
std::vector<int> primes;
// Find primes and measure execution time
clock_t start_time = clock();
int primes_count = find_primes(start, end);
clock_t end_time = clock();
double execution_time = static_cast<double>(end_time - start_time) / CLOCKS_PER_SEC;
std::cout << "Execution time: " << execution_time << " seconds" << std::endl;
std::cout << "Total prime numbers found: " << primes.size() << std::endl;
return 0;
}

C++ Execution Time: 8.9 seconds

using System;
using System.Collections.Generic;
using System.Diagnostics;

public class Program
{
public static bool IsPrime(int num)
{
if (num <= 1) return false;
for (int i = 2; i * i <= num; ++i)
{
if (num % i == 0) return false;
}
return true;
}

public static int FindPrimes(int start, int end)
{
int count = 0;
for (int num = start; num <= end; ++num)
{
if (IsPrime(num))
{
count++;
}
}
return count;
}

public static void Main()
{
int start = 0;
int end = 10000000;
// Find primes and measure execution time
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
int primes_count = FindPrimes(start, end);
stopwatch.Stop();
double executionTime = stopwatch.Elapsed.TotalSeconds;
Console.WriteLine("Execution time: " + executionTime + " seconds");
Console.WriteLine("Total prime numbers found: " + primes_count);
}
}

C# Execution Time: 9.0 seconds

use std::time::Instant;

fn is_prime(num: i32) -> bool {
if num <= 1 {
return false;
}
for i in 2..=((num as f64).sqrt() as i32) {
if num % i == 0 {
return false;
}
}
true
}

fn find_primes(start: i32, end: i32) -> i32 {
let mut count = 0;
for num in start..=end {
if is_prime(num) {
count += 1;
}
}
count
}

fn main() {
let start = 0;
let end = 10000000;
// Find primes and measure execution time
let start_time = Instant::now();
let primes_count = find_primes(start, end);
let end_time = Instant::now();
let execution_time = end_time.duration_since(start_time).as_secs_f64();
println!("Execution time: {} seconds", execution_time);
println!("Total prime numbers found: {}", primes_count);
}

Rust Execution Time: 16.2 seconds

function isPrime(num) {
if (num <= 1) return false;
for (let i = 2; i * i <= num; ++i) {
if (num % i === 0) return false;
}
return true;
}

function findPrimes(start, end) {
let count = 0;
for (let num = start; num <= end; ++num) {
if (isPrime(num)) {
count++;
}
}
return count;
}

function main() {
const start = 0;
const end = 10000000;
// Find primes and measure execution time
const startTime = new Date();
const primes_count = findPrimes(start, end);
const endTime = new Date();
const executionTime = (endTime - startTime) / 1000;
console.log("Execution time:", executionTime, "seconds");
console.log("Total prime numbers found:", primes_count);
}

main();

JS Execution Time: 8.9 seconds

import time

def is_prime(num):
if num <= 1:
return False
for i in range(2, int(num**0.5) + 1):
if num % i == 0:
return False
return True

def find_primes(start, end):
count = 0
for num in range(start, end + 1):
if is_prime(num):
count += 1
return count

def main():
start = 0
end = 10000000
# Find primes and measure execution time
start_time = time.time()
primes_count = find_primes(start, end)
end_time = time.time()
execution_time = end_time - start_time
print("Execution time:", execution_time, "seconds")
print("Total prime numbers found:", primes_count)

main()

Python Execution Time: 101.9 seconds (too slow)

import time
import numba

@numba.jit
def is_prime_numba(num):
if num <= 1:
return False
for i in range(2, int(num**0.5) + 1):
if num % i == 0:
return False
return True

@numba.njit(fastmath=True, cache=True, parallel=True)
def find_primes_numba(start, end):
# return [num for num in numba.prange(start, end + 1) if is_prime_numba(num)]

count = 0
for num in numba.prange(start, end + 1):
if is_prime_numba(num):
count += 1
return count

def main():
start = 0
end = 10000000
# Find primes and measure execution time
start_time = time.time()
primes_count = find_primes_numba(start, end)
end_time = time.time()
execution_time = end_time - start_time
print("Execution time (with Numba):", execution_time, "seconds")
print("Total prime numbers found:", primes_count)

main()

Python with Numba Execution Time: 2.3 seconds (the fastest)

Results and Explanation:

After running the provided code, we observed that Python with Numba outperformed the C++ implementation in terms of execution time for finding prime numbers count within the range of 0 to 10,000,000. This result might seem surprising at first, as traditionally C++ is known for its superior performance compared to Python due to its nature as a compiled language. However, with the help of Numba’s just-in-time (JIT) compilation and parallel processing features, Python code can achieve significant speedups.

Numba Use Cases:

Numba excels in scenarios where performance is critical, and numerical computations, simulations, and scientific calculations form a significant part of the workload. It shines in the following use cases:

Scientific Computing:

Numba enhances the performance of complex scientific algorithms, simulations, and data analysis tasks, providing a significant boost to researchers and scientists.

import time
import numba

def f(x):
# The function to be integrated
return x**2

def numerical_integration_without_numba(f, a, b, n):
h = (b - a) / n
integral = (f(a) + f(b)) / 2.0
for i in range(1, n):
x = a + i * h
integral += f(x)
integral *= h
return integral


@numba.jit
def g(x):
# The function to be integrated
return x**2

@numba.jit
def numerical_integration_with_numba(f, a, b, n):
h = (b - a) / n
integral = (f(a) + f(b)) / 2.0
for i in range(1, n):
x = a + i * h
integral += f(x)
integral *= h
return integral

def main():
a = 0.0 # Lower limit of integration
b = 1.0 # Upper limit of integration
n = 10000000 # Number of trapezoids

# Without Numba
start_time = time.time()
result_without_numba = numerical_integration_without_numba(f, a, b, n)
end_time = time.time()
execution_time_without_numba = end_time - start_time

print("Numerical Integration without Numba:")
print("Result:", result_without_numba)
print("Execution time:", execution_time_without_numba, "seconds")

# With Numba
start_time = time.time()
result_with_numba = numerical_integration_with_numba(g, a, b, n)
end_time = time.time()
execution_time_with_numba = end_time - start_time

print("Numerical Integration with Numba:")
print("Result:", result_with_numba)
print("Execution time:", execution_time_with_numba, "seconds")

if __name__ == "__main__":
main()

Machine Learning and Data Science:

Numba can accelerate various machine learning algorithms, particularly those involving array computations and linear algebra operations, leading to faster model training and predictions.

import numpy as np
import time
import numba


def linear_regression_without_numba(X, y):
n = len(X)
X_mean = np.mean(X)
y_mean = np.mean(y)

numerator = 0.0
denominator = 0.0

for i in range(n):
numerator += (X[i] - X_mean) * (y[i] - y_mean)
denominator += (X[i] - X_mean) ** 2

slope = numerator / denominator
intercept = y_mean - slope * X_mean
return slope, intercept


@numba.jit
def linear_regression_with_numba(X, y):
n = len(X)
X_mean = np.mean(X)
y_mean = np.mean(y)

numerator = 0.0
denominator = 0.0

for i in range(n):
numerator += (X[i] - X_mean) * (y[i] - y_mean)
denominator += (X[i] - X_mean) ** 2

slope = numerator / denominator
intercept = y_mean - slope * X_mean
return slope, intercept

def main():
# Generate a large dataset
np.random.seed(0)
X = np.random.rand(10000000) # Predictor variable
y = 2 * X + 3 + np.random.randn(10000000) # Target variable (with some noise)

# Without Numba
start_time = time.time()
slope, intercept = linear_regression_without_numba(X, y)
end_time = time.time()
execution_time_without_numba = end_time - start_time

print("Linear Regression without Numba:")
print("Slope:", slope)
print("Intercept:", intercept)
print("Execution time:", execution_time_without_numba, "seconds")

# With Numba
start_time = time.time()
slope, intercept = linear_regression_with_numba(X, y)
end_time = time.time()
execution_time_with_numba = end_time - start_time

print("Linear Regression with Numba:")
print("Slope:", slope)
print("Intercept:", intercept)
print("Execution time:", execution_time_with_numba, "seconds")

if __name__ == "__main__":
main()

Computational Physics and Engineering:

Numba proves invaluable for simulations and solving differential equations, enabling engineers and physicists to achieve results efficiently.

import time
import numba


def simulate_particle_motion_without_numba(mass, initial_position, initial_velocity, constant_force, time_step, num_steps):
position = initial_position
velocity = initial_velocity

for _ in range(num_steps):
acceleration = constant_force / mass
velocity += acceleration * time_step
position += velocity * time_step

return position


@numba.jit
def simulate_particle_motion_with_numba(mass, initial_position, initial_velocity, constant_force, time_step, num_steps):
position = initial_position
velocity = initial_velocity

for _ in range(num_steps):
acceleration = constant_force / mass
velocity += acceleration * time_step
position += velocity * time_step

return position


def main():
# Particle parameters
mass = 1.0
initial_position = 0.0
initial_velocity = 0.0
constant_force = 10.0

# Simulation parameters
time_step = 0.01
num_steps = 10000000

# Without Numba
start_time = time.time()
final_position_without_numba = simulate_particle_motion_without_numba(mass, initial_position, initial_velocity, constant_force, time_step, num_steps)
end_time = time.time()
execution_time_without_numba = end_time - start_time

print("Simulation without Numba:")
print("Final Position:", final_position_without_numba)
print("Execution time:", execution_time_without_numba, "seconds")

# With Numba
start_time = time.time()
final_position_with_numba = simulate_particle_motion_with_numba(mass, initial_position, initial_velocity, constant_force, time_step, num_steps)
end_time = time.time()
execution_time_with_numba = end_time - start_time

print("Simulation with Numba:")
print("Final Position:", final_position_with_numba)
print("Execution time:", execution_time_with_numba, "seconds")


if __name__ == "__main__":
main()

Financial Modeling:

Numba can be employed to optimize financial calculations, such as option pricing, portfolio optimization, and risk analysis, facilitating real-time decision-making.

import numpy as np
import time
import numba


def option_pricing_without_numba(S0, K, r, sigma, T, num_simulations, num_steps):
dt = T / num_steps
total_payoff = 0.0

for _ in range(num_simulations):
S = S0
for _ in range(num_steps):
epsilon = np.random.normal(0.0, 1.0)
S *= np.exp((r - 0.5 * sigma**2) * dt + sigma * np.sqrt(dt) * epsilon)

total_payoff += max(S - K, 0)

option_price = total_payoff / num_simulations
return option_price


@numba.jit
def option_pricing_with_numba(S0, K, r, sigma, T, num_simulations, num_steps):
dt = T / num_steps
total_payoff = 0.0

for _ in range(num_simulations):
S = S0
for _ in range(num_steps):
epsilon = np.random.normal(0.0, 1.0)
S *= np.exp((r - 0.5 * sigma**2) * dt + sigma * np.sqrt(dt) * epsilon)

total_payoff += max(S - K, 0)

option_price = total_payoff / num_simulations
return option_price


def main():
# Option parameters
S0 = 100.0 # Initial stock price
K = 100.0 # Strike price
r = 0.05 # Risk-free interest rate
sigma = 0.2 # Volatility (standard deviation of returns)
T = 1.0 # Time to expiration (in years)

# Monte Carlo simulation parameters
num_simulations = 100000 # Number of simulations
num_steps = 252 # Number of steps (days) for each simulation

# Without Numba
start_time = time.time()
option_price_without_numba = option_pricing_without_numba(S0, K, r, sigma, T, num_simulations, num_steps)
end_time = time.time()
execution_time_without_numba = end_time - start_time

print("Option Pricing without Numba:")
print("Option Price:", option_price_without_numba)
print("Execution time:", execution_time_without_numba, "seconds")

# With Numba
start_time = time.time()
option_price_with_numba = option_pricing_with_numba(S0, K, r, sigma, T, num_simulations, num_steps)
end_time = time.time()
execution_time_with_numba = end_time - start_time

print("Option Pricing with Numba:")
print("Option Price:", option_price_with_numba)
print("Execution time:", execution_time_with_numba, "seconds")

if __name__ == "__main__":
main()

Parallelization:

As demonstrated in the additional example, Numba’s support for parallel processing allows developers to fully utilize multicore processors and tackle large-scale parallel computations efficiently.

import numpy as np
import time
import numba


def matrix_multiply_without_numba(A, B):
m, n, p = A.shape[0], A.shape[1], B.shape[1]
result = np.zeros((m, p), dtype=np.float64)

for i in range(m):
for j in range(p):
for k in range(n):
result[i, j] += A[i, k] * B[k, j]

return result


@numba.njit(parallel=True)
def matrix_multiply_with_numba(A, B):
m, n, p = A.shape[0], A.shape[1], B.shape[1]
result = np.zeros((m, p), dtype=np.float64)

for i in numba.prange(m):
for j in range(p):
for k in range(n):
result[i, j] += A[i, k] * B[k, j]

return result

def main():
# Generate large random matrices
size = 200
A = np.random.rand(size, size)
B = np.random.rand(size, size)

# Without Numba
start_time = time.time()
result_without_numba = matrix_multiply_without_numba(A, B)
end_time = time.time()
execution_time_without_numba = end_time - start_time

print("Matrix Multiplication without Numba:")
print("Execution time:", execution_time_without_numba, "seconds")

# With Numba Parallelization
start_time = time.time()
result_with_numba = matrix_multiply_with_numba(A, B)
end_time = time.time()
execution_time_with_numba = end_time - start_time

print("Matrix Multiplication with Numba Parallelization:")
print("Execution time:", execution_time_with_numba, "seconds")

if __name__ == "__main__":
main()

In all the test cases, you will observe a noticeable advantage in using Numba when dealing with large datasets. The functions optimized with Numba consistently outperform the Python implementations without Numba. As the data size increases, the benefit of using Numba becomes even more pronounced, resulting in significant performance improvements. Numba proves to be a valuable asset in scenarios where enhanced execution speed is crucial, such as scientific computing, machine learning, computational physics, financial modeling, and parallel processing. Its ability to harness the power of just-in-time compilation and parallel processing enables developers to achieve remarkable performance gains, especially when dealing with extensive and computationally intensive tasks. As the data scales up, Numba’s impact on speeding up operations becomes increasingly evident, making it an indispensable tool for data-driven applications.

GitHub Repositories for Code Comparison and Numba Use Cases:

If you’re interested in exploring the code comparisons between Python with Numba and other programming languages or delving deeper into various Numba use cases, you can find the relevant code and examples in the following GitHub repositories:

Conclusion:

Numba has undoubtedly proven to be a game-changer for Python developers seeking enhanced performance in computationally intensive tasks. By leveraging Numba’s JIT compilation capabilities, Python can compete with traditionally faster languages like C++, C#, Rust and JavaScript. However, it’s essential to consider the nature of the task at hand when deciding whether to use Numba or not. For numerical computations, simulations, scientific calculations, and algorithms that can benefit from parallelization, Numba can be a valuable addition to the Python developer’s toolbox. When performance is a critical factor, Numba empowers Python developers to achieve optimal execution speeds without sacrificing Python’s simplicity and expressiveness.

If you liked this content, please share it.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top