Skip to main content

NumPy Integration: Use PyO3 with NumPy Arrays

NumPy arrays are the lingua franca of Python data science. When your extension needs to process numerical data in bulk—matrices, time series, images—PyO3 and the ndarray crate (Rust's NumPy equivalent) combine to provide zero-copy access, type safety, and blazing speed. This article shows you how to accept NumPy arrays from Python, operate on them in Rust without copying, and return results as NumPy arrays. You will learn to handle multidimensional data, iterate over arrays efficiently, and benchmark the performance gain over pure Python.

NumPy arrays in Python are contiguous memory buffers; Rust's ndarray crate represents the same layout natively. PyO3's PyReadonlyArray and PyArray types let you borrow these buffers directly, eliminating the copy overhead that makes naive bindings slow. For numerical computing, this is the difference between "faster than Python" and "10–100× faster."

Setup: Adding ndarray to Your Project

In your Cargo.toml, add the ndarray crate and enable PyO3's NumPy support:

[dependencies]
pyo3 = { version = "0.21", features = ["extension-module"] }
ndarray = "0.15"
numpy = "0.21" # PyO3's NumPy bindings

The numpy crate is PyO3's official NumPy integration; it provides PyReadonlyArray<'py, T, D> (immutable array reference) and PyArray<'py, T, D> (mutable array reference).

Accepting a NumPy Array: Immutable Access

The simplest case is a read-only operation on a NumPy array. Use PyReadonlyArray<'py, T, D> where T is the element type (i32, f64) and D is the dimensionality:

use pyo3::prelude::*;
use numpy::PyReadonlyArray1;

#[pyfunction]
fn sum_array(arr: PyReadonlyArray1<f64>) -> f64 {
let array = arr.as_array();
array.iter().sum()
}

#[pymodule]
fn numpy_ext(m: &Bound<'_, PyModule>) -> PyResult<()> {
m.add_function(wrap_pyfunction!(sum_array, m)?)?;
Ok(())
}

From Python:

import numpy as np
from numpy_ext import sum_array

arr = np.array([1.0, 2.0, 3.0, 4.0, 5.0])
result = sum_array(arr)
print(result) # Output: 15.0

The PyReadonlyArray1<f64> parameter tells PyO3 to accept a 1-D NumPy array of 64-bit floats. The .as_array() method borrows the underlying ndarray::Array<f64, Ix1>, which you can then iterate over or operate on using ndarray methods.

Multidimensional Arrays

For 2-D matrices, use PyReadonlyArray2<T>:

use pyo3::prelude::*;
use numpy::{PyReadonlyArray2, PyArray2};
use ndarray::Array2;

#[pyfunction]
fn matrix_sum(matrix: PyReadonlyArray2<f64>) -> f64 {
let array = matrix.as_array();
array.iter().sum()
}

#[pyfunction]
fn transpose(matrix: PyReadonlyArray2<f64>) -> PyResult<Py<PyArray2<f64>>> {
let array = matrix.as_array();
let transposed = array.t().to_owned();
Python::with_gil(|py| Ok(PyArray2::from_array(py, &transposed).into()))
}

#[pymodule]
fn matrix_ext(m: &Bound<'_, PyModule>) -> PyResult<()> {
m.add_function(wrap_pyfunction!(matrix_sum, m)?)?;
m.add_function(wrap_pyfunction!(transpose, m)?)?;
Ok(())
}

From Python:

import numpy as np
from matrix_ext import matrix_sum, transpose

matrix = np.array([[1.0, 2.0], [3.0, 4.0]])
print(matrix_sum(matrix)) # Output: 10.0
print(transpose(matrix)) # Output: [[1. 3.] [2. 4.]]

The array.t() method transposes (creates a view; no copy), and .to_owned() materializes it as a new array if needed.

Returning NumPy Arrays

To return a NumPy array from Rust, use PyArray<T, D> and wrap it in Py<PyArray<...>> for ownership:

use pyo3::prelude::*;
use numpy::{PyReadonlyArray1, PyArray1};
use ndarray::Array1;

#[pyfunction]
fn double(arr: PyReadonlyArray1<i32>) -> PyResult<Py<PyArray1<i32>>> {
let array = arr.as_array();
let doubled = array * 2;
Python::with_gil(|py| {
Ok(PyArray1::from_array(py, &doubled).into())
})
}

#[pymodule]
fn ops_ext(m: &Bound<'_, PyModule>) -> PyResult<()> {
m.add_function(wrap_pyfunction!(double, m)?)?;
Ok(())
}

From Python:

import numpy as np
from ops_ext import double

arr = np.array([1, 2, 3, 4])
result = double(arr)
print(result) # Output: [2 4 6 8]

The Python::with_gil(|py| ...) block ensures you hold the Global Interpreter Lock while creating Python objects. PyArray1::from_array() converts the Rust ndarray::Array1 to a NumPy array without copying (if the layout is compatible).

Mutable Arrays and In-Place Operations

To modify a NumPy array in place, use PyArray<T, D> (not PyReadonlyArray):

use pyo3::prelude::*;
use numpy::PyArray1;

#[pyfunction]
fn increment_inplace(mut arr: PyArray1<f64>) {
let mut array = arr.as_array_mut();
array.iter_mut().for_each(|x| *x += 1.0);
}

#[pymodule]
fn inplace_ext(m: &Bound<'_, PyModule>) -> PyResult<()> {
m.add_function(wrap_pyfunction!(increment_inplace, m)?)?;
Ok(())
}

From Python:

import numpy as np
from inplace_ext import increment_inplace

arr = np.array([1.0, 2.0, 3.0])
increment_inplace(arr)
print(arr) # Output: [2. 3. 4.]

The as_array_mut() method returns a mutable borrow; you can modify the array in place without allocating new memory.

Performance: NumPy Arrays vs Python Lists

A benchmark illustrates the performance advantage. Consider summing 10 million integers:

import numpy as np
import time
from numpy_ext import sum_array

# Pure Python list
python_list = list(range(10_000_000))
start = time.perf_counter()
py_sum = sum(python_list)
py_time = (time.perf_counter() - start) * 1000 # ms
print(f"Python list: {py_time:.2f} ms")

# NumPy array via PyO3
numpy_arr = np.arange(10_000_000, dtype=np.float64)
start = time.perf_counter()
np_sum = sum_array(numpy_arr)
np_time = (time.perf_counter() - start) * 1000 # ms
print(f"NumPy + Rust: {np_time:.2f} ms")
print(f"Speedup: {py_time / np_time:.1f}×")

Typical results show a 10–50× speedup, depending on the operation. The larger the array and the simpler the operation, the bigger the win (because overhead dominates with small arrays).

Common Operations with ndarray

OperationCode
Sum all elementsarray.iter().sum()
Meanarray.mean().unwrap()
Element-wise multiply&array1 * &array2
Matrix multiplyarray1.dot(&array2)
Transposearray.t()
Slice (rows 1–3)array.slice(s![1..3, ..])
Reshapearray.into_shape((3, 4)).unwrap()

Key Takeaways

  • PyReadonlyArray<T, D> accepts NumPy arrays without copying; .as_array() provides a borrowed ndarray::Array.
  • PyArray<T, D> allows mutable access and in-place modifications.
  • Returning NumPy arrays requires Python::with_gil(|py| ...) and PyArray::from_array().
  • Multidimensional arrays use PyReadonlyArray2, PyReadonlyArray3, etc.
  • NumPy + Rust achieves 10–50× speedup over pure Python for compute-heavy operations.
  • The ndarray crate provides familiar methods (sum(), mean(), dot()) for numerical operations.

Frequently Asked Questions

Can I accept a NumPy array with any dtype, not just a specific type like f64?

Yes, using PyObject or &PyAny. However, you lose type safety and must check the dtype at runtime. For performance-critical code, fix the type signature and document the requirement.

What if the NumPy array is not contiguous in memory (e.g., a transposed array or a view)?

PyO3 raises a ValueError if you try to borrow a non-contiguous array as mutable. For immutable access, you can use .as_slice() on contiguous arrays, but for general multidimensional arrays, ndarray handles non-contiguous layouts transparently.

Can I use broadcasting with PyO3 arrays?

No, ndarray does not support NumPy-style broadcasting. If broadcasting is needed, handle it in Python (NumPy is optimized for it) and pass multiple pre-broadcast arrays to Rust.

How do I handle arrays with different dtypes (int32, float64, complex)?

Write separate functions for each dtype or use generics with macros. For example: fn process_i32(arr: PyReadonlyArray1<i32>) and fn process_f64(arr: PyReadonlyArray1<f64>). Both are exposed to Python.

Is there a performance cost to the Python::with_gil() block when returning arrays?

Acquiring the GIL costs roughly 10–100 nanoseconds. For most operations, this is negligible compared to the computation time. Profile your code to verify.

Further Reading