How to Call Rust from Python

Contents

Why bother?Rust and Maturin Builds your Rust code into a Python module Packages wheels for distribution Publishes to PyPI Integrates Rust into Python packaging Setting up a development environment Installing rust Example 1 — A Hello World equivalent Example 2 — Python loops vs Rust loops Example 3 — Adding parallelism to our existing Rust code Summary

, Python is fast enough — especially when you lean on NumPy, Polars, or other well‑tuned libraries written in compiled languages like C. But now and then, you end up with a hot loop that won’t vectorise: maybe you’re walking a list of strings to clean them up, or you’re parsing messy text where each character matters. You profile it, you confirm the culprit, and you stare at a for loop that eats half your runtime. This is the moment Rust shines.

Rust gives you predictable performance, tight control over memory, and fearless concurrency, without the hassle of manual memory management. If you’re thinking — not another language to learn!, the good news is you don’t need to abandon Python to use Rust. You can keep your orchestration, your notebooks, your tests — and move only the tiny, boring inner loops to Rust. This keeps the Rust learning curve to an absolute minimum.

In this article, I’ll demonstrate how to call Rust from Python and compare the performance differences between running pure Python and a Python/Rust combination. This won’t be a tutorial on Rust programming, as I’m assuming you at least know the basics of that.

Why bother?

Now, you might think: if I know Rust, why would I even bother integrating it with Python —just program in Rust, right?

Well, first, I would say that knowing Rust does not automatically make it the best language for your whole application. For many systems, e.g., ML, AI, scripting and Web backends, etc, Python is already the language of choice.

Secondly, most code is not performance-critical. For those parts that are, you often need only a very small subset of Rust to make a real difference, so a tiny bit of Rust knowledge can go a long way.

Lastly, Python’s ecosystem is hard to replace. Even if you know Rust well, Python gives you immediate access to tools like:

pandas
NumPy
scikit-learn
Jupyter
Airflow
FastAPI tooling
a huge amount of scripting and automation libraries

Rust may be faster, but Python often wins on ecosystem reach and development convenience.

Hopefully, I’ve done enough to convince you to give integrating Rust with Python a chance. With that being said, let’s get started.

Rust and Maturin

For our use cases, we need two things: Rust and a tool called maturin.

Most of you will know about Rust. A fast compiled language that has come to the fore in recent years. You might not have heard of maturin, though.

Maturin is basically a build and packaging tool for Python extensions written in Rust (using PyO3 or rust-cpython). It helps us do the following:

Builds your Rust code into a Python module

Takes your Rust crate and compiles it into a shared library (.pyd on Windows, .so on Linux, .dylib on macOS) that Python can import.
Automatically sets the correct compiler flags for release/debug and for the Python version you’re targeting.
Works with PyO3’s extension module feature, so Python can import the compiled library as a normal module.

Packages wheels for distribution

Wheels are the .whl files you upload to PyPI (precompiled binaries).
Maturin supports building wheels for manylinux, macOS, and Windows that work across Python versions and platforms.
It cross-compiles when needed, or runs inside a Docker image to satisfy PyPI’s “manylinux” rules.

Publishes to PyPI

With one command, Maturin can build your Rust extension and upload it.
Handles credentials, metadata, and platform tags automatically.

Integrates Rust into Python packaging

Maturin generates a pyproject.toml that defines your project so Python tools like pip know how to build it.
Supports PEP 517, so pip install works even if the user doesn’t have maturin installed.
Works seamlessly with setuptools when you mix Python and Rust code in a single package.

OK, that’s enough of the theory, let’s get down to writing, running, and timing some code samples.

Setting up a development environment

As usual, we’ll set up a separate development environment to do our work. That way, our work won’t interfere with any other projects we might have on the go. I use the UV tool for this, and I’m using WSL2 Ubuntu for Windows as my operating system.

$ uv init pyrust
$ cd pyrust
$ uv venv pyrust
$ source pyrust/bin/activate
(pyrust) $

Installing rust

Now we can install Rust with this simple command.

(pyrust) $ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Eventually, 3 options will be displayed on your screen like this.

Welcome to Rust!

This will download and install the official compiler for the Rust
programming language, and its package manager, Cargo.
...
...
...

1) Proceed with standard installation (default - just press enter)
2) Customize installation
3) Cancel installation

Press 1, then press Enter when prompted for installation options if you’d like to use the default options. To ensure Rust is properly installed, run the following command.

(pyrust) $ rustc --version

rustc 1.89.0 (29483883e 2025-08-04)

Example 1 — A Hello World equivalent

Let’s start with a simple example of calling Rust from Python. Create a new sub-folder and add these three files.

Cargo.toml

[package]
name = "hello_rust"
version = "0.1.0"
edition = "2021"

[lib]
crate-type = ["cdylib"]

[dependencies]
pyo3 = t

pyproject.toml

[build-system]
requires = ["maturin>=1.5,<2"]
build-backend = "maturin"

[project]
name = "hello_rust"
version = "0.1.0"
requires-python = ">=3.9"

Lastly, our Rust source file goes into the subfolder src/lib.rs

use pyo3::prelude::*;

/// A simple function we’ll expose to Python
#[pyfunction]
fn greet(name: &str) -> PyResult<String>  process_one(t)).collect())


/// The module definition
#[pymodule]
fn hello_rust(_py: Python, m: &Bound<'_, PyModule>) -> PyResult<()> t

Now run it with …

(pyrust) $ python -c "import hello_rust as hr; print(hr.greet('world'))"


# Output
Hello, world from Rust!

We put our Rust code in src/lib.rs to follow the convention that Rust library code goes there, rather than in src/main.rs, which is reserved for stand-alone Rust executable code.

Maturin + PyO3 looks inside src/lib.rs for the #[pymodule] function, which registers your Rust functions for Python to call.

Example 2 — Python loops vs Rust loops

Consider something deliberately mundane but representative: you have a list of sentences and need to normalise them. By normalise, I mean converting them into a standard, consistent form before further processing.

Suppose we want to lowercase everything, drop punctuation, and split into tokens. This is hard to vectorise efficiently because the logic branches on every character.

In pure Python, you might write this:-

# ------------------------
# Python baseline
# ------------------------
def process_one_py(text: str) -> list[str]:
    word = []
    out = []

    for c in text:
        if c.isalnum():
            word.append(c.lower())
        else:
            if word:
                out.append("".join(word))
                word = []

    if word:
        out.append("".join(word))

    return out

# Run the above for many inputs
def batch_process_py(texts: list[str]) -> list[list[str]]:
    return [process_one_py(t) for t in texts]

So, for example,

(pyrust) $ batch_process_py["Hello, World! 123", "This is a test"]

Would return,

[['hello', 'world', '123'], ['this', 'is', 'a', 'test']]

This is what the Rust equivalent might look like,

/// src/lib.rs

use pyo3::prelude::*;
use pyo3::wrap_pyfunction;

/// Process one string: lowercase + drop punctuation + split on whitespace
fn process_one(text: &str) -> Vec<String> {
    let mut out = Vec::new();
    let mut word = String::new();

    for c in text.chars() {
        if c.is_alphanumeric()  process_one(t)).collect())
 else if c.is_whitespace() t
        // ignore punctuation entirely
    }

    if !word.is_empty()  process_one(t)).collect())

    out
}

#[pyfunction]
fn batch_process(texts: Vec<String>) -> PyResult<Vec<Vec<String>>> {
    Ok(texts.iter().map(|t| process_one(t)).collect())

#[pymodule]
fn rust_text(_py: Python<'_>, m: &Bound<PyModule>) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(batch_process, m)?)?;
    Ok(())
}

Ok, let’s run these two programs with a large input (500,000 texts) and see what the run-time differences are. For that, I’ve written a benchmark Python script as follows.

from time import perf_counter
from statistics import median
import random
import string
import rust_text  # the compiled extension

# ------------------------
# Python baseline
# ------------------------
def process_one_py(text: str) -> list[str]:
    word = []
    out = []
    for c in text:
        if c.isalnum():
            word.append(c.lower())
        elif c.isspace():
            if word:
                out.append("".join(word))
                word = []
        # ignore punctuation
    if word:
        out.append("".join(word))
    return out

def batch_process_py(texts: list[str]) -> list[list[str]]:
    return [process_one_py(t) for t in texts]

# ------------------------
# Synthetic data
# ------------------------
def make_texts(n=500_000, vocab=10_000, mean_len=40):
    words = ["".join(random.choices(string.ascii_lowercase, k=5)) for _ in range(vocab)]
    texts = []
    for _ in range(n):
        L = max(3, int(random.expovariate(1/mean_len)))
        texts.append(" ".join(random.choice(words) for _ in range(L)))
    return texts

texts = make_texts()

# ------------------------
# Timing helper
# ------------------------
def timeit(fn, *args, repeat=5):
    runs = []
    for _ in range(repeat):
        t0 = perf_counter()
        fn(*args)
        t1 = perf_counter()
        runs.append(t1 - t0)
    return median(runs)

# ------------------------
# Run benchmarks
# ------------------------
py_time = timeit(batch_process_py, texts)
rust_time = timeit(rust_text.batch_process, texts)

n = len(texts)
print("\n--- Benchmark ---")
print(f"Python     median: {py_time:.3f} s | throughput: {n/py_time:,.0f} texts/s")
print(f"Rust 1-thread median: {rust_time:.3f} s | throughput: {n/rust_time:,.0f} texts/s")

As before, we need to compile our Rust code so Python can import it. In the previous example, maturin was used indirectly as the build backend via pyproject.toml. Here, we call it directly from the command line:

(pyrust) $ maturin develop --release

And now we can simply run our benchmark code like this.

(pyrust) $ python benchmark.py

--- Benchmark ---
Python     median: 5.159 s | throughput: 96,919 texts/s
Rust 1-thread median: 3.024 s | throughput: 165,343 texts/s

That was a reasonable speed up without too much effort. There’s one more thing we can use to get even greater decreases in runtime.

Rust has access to a parallelising library called Rayon, making it easy to spread code across multiple CPU cores. In a nutshell, Rayon …

Let’s you replace sequential iterators (iter()) with parallel iterators (par_iter()).
Automatically splits your data into chunks, distributes the work across CPU threads, and then merges the results.
Abstracts away the complexity of thread management and synchronisation

Example 3 — Adding parallelism to our existing Rust code

This is straightforward. If we look at the Rust code from the previous example, we only need to make the following three minor changes (marked with comments below).

/// src/lib.rs
use pyo3::prelude::*;
use pyo3::wrap_pyfunction;

/// Add this line - Change 1
use rayon::prelude::*; 

/// Process one string: lowercase + drop punctuation + split on whitespace
fn process_one(text: &str) -> Vec<String> {
    let mut out = Vec::new();
    let mut word = String::new();

    for c in text.chars() {
        if c.is_alphanumeric() {
            word.push(c.to_ascii_lowercase());
        } else if c.is_whitespace() {
            if !word.is_empty() {
                out.push(std::mem::take(&mut word));
            }
        }
        // ignore punctuation entirely
    }

    if !word.is_empty() {
        out.push(word);
    }
    out
}

#[pyfunction]
fn batch_process(texts: Vec<String>) -> PyResult<Vec<Vec<String>>> {
    Ok(texts.iter().map(|t| process_one(t)).collect())
}

/// Add this function - change 2
#[pyfunction]
fn batch_process_parallel(texts: Vec<String>) -> PyResult<Vec<Vec<String>>> {
    Ok(texts.par_iter().map(|t| process_one(t)).collect())
}

#[pymodule]
fn rust_text(_py: Python<'_>, m: &Bound<PyModule>) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(batch_process, m)?)?;
    // Add this line - change 3
    m.add_function(wrap_pyfunction!(batch_process_parallel, m)?)?;
    Ok(())
}

In our benchmark Python code, we only need to add a call to the parallel Rust code and print out the new results.

...
...

# ------------------------
# Run amended benchmarks
# ------------------------
py_time = timeit(batch_process_py, texts)
rust_time = timeit(rust_text.batch_process, texts)
rust_par_time = timeit(rust_text.batch_process_parallel, texts)

n = len(texts)
print("\n--- Benchmark ---")
print(f"Python     median: {py_time:.3f} s | throughput: {n/py_time:,.0f} texts/s")
print(f"Rust 1-thread median: {rust_time:.3f} s | throughput: {n/rust_time:,.0f} texts/s")
print(f"Rust Rayon median:   {rust_par_time:.3f} s | throughput: {n/rust_par_time:,.0f} texts/s")

Here are the results from running the amended benchmark.

--- Benchmark ---
Python median: 5.171 s | throughput: 96,694 texts/s
Rust 1-thread median: 3.091 s | throughput: 161,755 texts/s
Rust Rayon median: 2.223 s | throughput: 224,914 texts/s

The parallelised Rust code shaved about 27% off the non-parallelised Rust time and was more than twice as fast as the bare Python code. Not too shabby.

Summary

Python is usually fast enough for most tasks. But if profiling shows a slow spot that can’t be vectorised and really affects your runtime, you don’t have to give up on Python or rewrite your whole project. Instead, you can move just the performance-critical parts to Rust and leave the rest of your code as it is.

With PyO3 and maturin, you can compile Rust code into a Python module that works smoothly with your existing libraries. This lets you keep most of your Python code, tests, packaging, and workflows, while getting the speed, memory safety, and concurrency benefits of Rust where you need them most.

The simple examples and benchmarks here show that rewriting just a small part of your code in Rust can make Python much faster. Adding Rayon for parallelism boosts performance even more, with only a few code changes and no complicated tools. This is a practical and easy way to speed up Python workloads without switching your whole project to Rust.