MLX vs MPS vs CUDA: a Benchmark. A first benchmark of Apple’s new ML… | by Tristan Bilot

Contents

A first benchmark of Apple’s new ML framework MLX Crafting an environment GCN implementation

A first benchmark of Apple’s new ML framework MLX

Photo by Javier Allegue Barros on Unsplash

If you’re a Mac user and a deep learning enthusiast, you’ve probably wished at some point that your Mac could handle those heavy models, right? Well, guess what? Apple just released MLX, a framework for running ML models efficiently on Apple Silicon.

The recent introduction of the MPS backend in PyTorch 1.12 was already a bold step, but with the announcement of MLX, it seems that Apple wants to make a significant leap into open source deep learning.

In this article, we’ll put these new approaches through their paces, benchmarking them against the traditional CPU backend and two CUDA-enabled GPUs. By doing so, we aim to reveal just how much these novel Mac-compatible methods can be used in 2024 for deep learning experiments.

As a GNN-oriented researcher, I’ll focus the benchmark on a Graph Convolutional Network (GCN) model. But since this model mainly consists of linear layers, our findings could be insightful even for those not specifically in the GNN sphere.

Crafting an environment

To build an environment for MLX, we have to specify whether using the i386 or arm architecture. With conda, this can be done using:

CONDA_SUBDIR=osx-arm64 conda create -n mlx python=3.10 numpy -c conda-forge
conda activate mlx

To check if your env is actually using arm, the output of the following command should be arm, not i386:

python -c "import platform; print(platform.processor())"

Now simply install MLX using pip, and you’re all set to start exploring:

pip install mlx

GCN implementation

The GCN model, a type of Graph Neural Network (GNN), works with an adjacency matrix (representing the graph structure) and node features. It calculates node embeddings by gathering info from neighboring nodes. Specifically, each node gets the average of its neighbors’ features. This averaging is done by multiplying the node features with the normalized adjacency matrix, adjusted by node degree. To learn this process, the features are first projected into an embedding space via a linear layer.