instalment in my two-part series on the Ray library, a Python framework created by AnyScale for distributed and parallel computing. Part 1 covered how to parallelise CPU-intensive Python jobs on your local PC by distributing the workload across all available cores, resulting in marked improvements in runtime. I’ll leave a link to Part 1 at the end of this article.
This part deals with a similar theme, except we take distributing Python workloads to the next level by using Ray to parallelise them across multi-server clusters in the cloud.
If you’ve come to this without having read Part 1, the TL;DR of Ray is that it is an open-source distributed computing framework designed to make it easy to scale Python programs from a laptop to a cluster with minimal code changes. That alone should hopefully be enough to pique your interest. In my own test, on my desktop PC, I took a straightforward, relatively simple Python program that finds prime numbers and reduced its runtime by a factor of 10 by adding just four lines of code.
Where can you run Ray clusters?
Ray clusters can be set up on the following:
- AWS and GCP Cloud, although unofficial integrations exist for other providers, too, such as Azure
- AnyScale, a fully managed platform developed by the creators of Ray.
- Kubernetes can also be used via the officially supported KubeRay project.
Prerequisites
To follow along with my process, you’ll need a few things set up beforehand. I’ll be using AWS for my demo, as I have an existing account there; however, I expect the setup for other cloud providers and platforms to be very similar. You should have:
- Credentials set up to run Cloud CLI commands from your chosen provider.
- A default VPC and at least one public subnet associated with it that has a publicly reachable IP address.
- An SSH Key pair file (.pem) that you can download to your local system so that Ray (and you) can connect to the nodes in your cluster
- You have enough quotas to satisfy the requested number of nodes and vCPUs in whichever cluster you set up.
If you want to do some local testing of your Ray code before deploying it to a cluster, you’ll also need to install the Ray library. We can do that using pip.
$ pip install ray
I’ll be running everything from a WSL2 Ubuntu shell on my Windows desktop.
To verify that Ray has been installed correctly, you should be able to use its command-line interpreter. In a terminal window, type in the following command.
$ ray --help
Usage: ray [OPTIONS] COMMAND [ARGS]...
Options:
--logging-level TEXT The logging level threshold, choices=['debug',
'info', 'warning', 'error', 'critical'],
default='info'
--logging-format TEXT The logging format.
default="%%(asctime)s\t%%(levelname)s
%%(filename)s:%%(lineno)s -- %%(message)s"
--version Show the version and exit.
--help Show this message and exit.
Commands:
attach Create or attach to a SSH session to a Ray cluster.
check-open-ports Check open ports in the local Ray cluster.
cluster-dump Get log data from one or more nodes.
...
...
...
If you don’t see this, something has gone wrong, and you should double-check the output of your install command.
Assuming everything is OK, we’re good to go.
One last important point, though. Creating resources, such as compute clusters, on a cloud provider like AWS will incur costs, so it’s essential you bear this in mind. The good news is that Ray has a built-in command that will tear down any infrastructure you create, but to be safe, you should double-check that no unused and potentially costly services get left “switched on” by mistake.
Our example Python code
The first step is to modify our existing Ray code from Part 1 to run on a cluster. Here is the original code for your reference. Recall that we are trying to count the number of prime numbers within a specific numeric range.
import math
import time
# -----------------------------------------
# Change No. 1
# -----------------------------------------
import ray
ray.init()
def is_prime(n: int) -> bool:
if n < 2: return False
if n == 2: return True
if n % 2 == 0: return False
r = int(math.isqrt(n)) + 1
for i in range(3, r, 2):
if n % i == 0:
return False
return True
# -----------------------------------------
# Change No. 2
# -----------------------------------------
@ray.remote(num_cpus=1) # pure-Python loop → 1 CPU per task
def count_primes(a: int, b: int) -> int:
c = 0
for n in range(a, b):
if is_prime(n):
c += 1
return c
if __name__ == "__main__":
A, B = 10_000_000, 20_000_000
total_cpus = int(ray.cluster_resources().get("CPU", 1))
# Start "chunky"; we can sweep this later
chunks = max(4, total_cpus * 2)
step = (B - A) // chunks
print(f"nodes={len(ray.nodes())}, CPUs~{total_cpus}, chunks={chunks}")
t0 = time.time()
refs = []
for i in range(chunks):
s = A + i * step
e = s + step if i < chunks - 1 else B
# -----------------------------------------
# Change No. 3
# -----------------------------------------
refs.append(count_primes.remote(s, e))
# -----------------------------------------
# Change No. 4
# -----------------------------------------
total = sum(ray.get(refs))
print(f"total={total}, time={time.time() - t0:.2f}s")
What modifications are needed to run it on a cluster? The answer is that just one minor change is required.
Change
ray.init()
to
ray.init(address=auto)
That’s one of the beauties of Ray. The same code runs almost unmodified on your local PC, and anywhere else you care to run it, including large, multi-server cloud clusters.
Setting up our cluster
On the cloud, a Ray cluster consists of a head node and one or more worker nodes. In AWS, all these nodes are simply EC2 instances. Ray clusters can be fixed-size or autoscale up and down based on the resources requested by applications running on the cluster. The head node is started first, and the worker nodes are configured with the head node’s address to form the cluster. If auto-scaling is enabled, worker nodes automatically scale up or down based on the application’s load and will scale down after a user-specified period (5 minutes by default).
Ray uses YAML files to set up clusters. A YAML file is just a plain-text file with a JSON-like syntax used for system configuration.
Here is the YAML file I’ll be using to set up my cluster. I found that the closest EC2 instance to my desktop PC, in terms of CPU core count and performance, was a c7g.8xlarge. For simplicity, I’m having the head node be the same server type as all the workers, but you can mix and match different EC2 types if desired.
cluster_name: ray_test
provider:
type: aws
region: eu-west-1
availability_zone: eu-west-1a
auth:
# For Amazon Linux AMIs the SSH user is 'ec2-user'.
# If you switch to an Ubuntu AMI, change this to 'ubuntu'.
ssh_user: ec2-user
ssh_private_key: ~/.ssh/ray-autoscaler_eu-west-1.pem
max_workers: 10
idle_timeout_minutes: 10
head_node_type: head_node
available_node_types:
head_node:
node_config:
InstanceType: c7g.8xlarge
ImageId: ami-06687e45b21b1fca9
KeyName: ray-autoscaler_eu-west-1
worker_node:
min_workers: 5
max_workers: 5
node_config:
InstanceType: c7g.8xlarge
ImageId: ami-06687e45b21b1fca9
KeyName: ray-autoscaler_eu-west-1
InstanceMarketOptions:
MarketType: spot
# =========================
# Setup commands (run on head + workers)
# =========================
setup_commands:
- |
set -euo pipefail
have_cmd() { command -v "$1" >/dev/null 2>&1; }
have_pip_py() {
python3 -c 'import importlib.util, sys; sys.exit(0 if importlib.util.find_spec("pip") else 1)'
}
# 1) Ensure Python 3 is present
if ! have_cmd python3; then
if have_cmd dnf; then
sudo dnf install -y python3
elif have_cmd yum; then
sudo yum install -y python3
elif have_cmd apt-get; then
sudo apt-get update -y
sudo apt-get install -y python3
else
echo "No supported package manager found to install python3." >&2
exit 1
fi
fi
# 2) Ensure pip exists
if ! have_pip_py; then
python3 -m ensurepip --upgrade >/dev/null 2>&1 || true
fi
if ! have_pip_py; then
if have_cmd dnf; then
sudo dnf install -y python3-pip || true
elif have_cmd yum; then
sudo yum install -y python3-pip || true
elif have_cmd apt-get; then
sudo apt-get update -y || true
sudo apt-get install -y python3-pip || true
fi
fi
if ! have_pip_py; then
curl -fsS https://bootstrap.pypa.io/get-pip.py -o /tmp/get-pip.py
python3 /tmp/get-pip.py
fi
# 3) Upgrade packaging tools and install Ray
python3 -m pip install -U pip setuptools wheel
python3 -m pip install -U "ray[default]"
Here is a brief explanation of each critical YAML section.
cluster_name: Assigns a name to the cluster, allowing Ray to track and manage
it separately from others.
provider: Specifies which cloud to use (AWS here), along with the region and
availability zone for launching instances.
auth: Defines how Ray connects to instances over SSH - the user name and the
private key used for authentication.
max_workers: Sets the maximum number of worker nodes Ray can scale up to when
more compute is needed.
idle_timeout_minutes: Tells Ray how long to wait before automatically terminating
idle worker nodes.
available_node_types: Describes the different node types (head and workers), their
instance sizes, AMI images, and scaling limits.
head_node_type: Identifies which of the node types acts as the cluster's controller
(the head node).
setup_commands: Lists shell commands that run once on each node when it's first
created, typically to install software or set up the environment.
To start the cluster creation, use this ray command from the terminal.
$ ray up -y ray_test.yaml
Ray will do its thing, creating all the necessary infrastructure, and after a few minutes, you should see something like this in your terminal window.
...
...
...
Next steps
To add another node to this Ray cluster, run
ray start --address='10.0.9.248:6379'
To connect to this Ray cluster:
import ray
ray.init()
To submit a Ray job using the Ray Jobs CLI:
RAY_ADDRESS='http://10.0.9.248:8265' ray job submit --working-dir . -- python my_script.py
See https://docs.ray.io/en/latest/cluster/running-applications/job-submission/index.html
for more information on submitting Ray jobs to the Ray cluster.
To terminate the Ray runtime, run
ray stop
To view the status of the cluster, use
ray status
To monitor and debug Ray, view the dashboard at
10.0.9.248:8265
If connection to the dashboard fails, check your firewall settings and network configuration.
Shared connection to 108.130.38.255 closed.
New status: up-to-date
Useful commands:
To terminate the cluster:
ray down /mnt/c/Users/thoma/ray_test.yaml
To retrieve the IP address of the cluster head:
ray get-head-ip /mnt/c/Users/thoma/ray_test.yaml
To port-forward the cluster's Ray Dashboard to the local machine:
ray dashboard /mnt/c/Users/thoma/ray_test.yaml
To submit a job to the cluster, port-forward the Ray Dashboard in another terminal and run:
ray job submit --address http://localhost:<dashboard-port> --working-dir . -- python my_script.py
To connect to a terminal on the cluster head for debugging:
ray attach /mnt/c/Users/thoma/ray_test.yaml
To monitor autoscaling:
ray exec /mnt/c/Users/thoma/ray_test.yaml 'tail -n 100 -f /tmp/ray/session_latest/logs/monitor*'
Running a Ray job on a cluster
At this stage, the cluster has been built, and we are ready to submit our Ray job to it. To give the cluster something more substantial to work with, I increased the range for the prime search in my code from 10,000,000 to 20,000,000 to 10,000,000–60,000,000. On my local desktop, Ray ran this in 18 seconds.
I waited a short time for all the cluster nodes to initialise fully, then ran the code on the cluster with this command.
$ ray exec ray_test.yaml 'python3 ~/ray_test.py'
Here is my output.
(base) tom@tpr-desktop:/mnt/c/Users/thoma$ ray exec ray_test2.yaml 'python3 ~/primes_ray.py'
2025-11-01 13:44:22,983 INFO util.py:389 -- setting max workers for head node type to 0
Loaded cached provider configuration
If you experience issues with the cloud provider, try re-running the command with --no-config-cache.
Fetched IP: 52.213.155.130
Warning: Permanently added '52.213.155.130' (ED25519) to the list of known hosts.
2025-11-01 13:44:26,469 INFO worker.py:1832 -- Connecting to existing Ray cluster at address: 10.0.5.86:6379...
2025-11-01 13:44:26,477 INFO worker.py:2003 -- Connected to Ray cluster. View the dashboard at http://10.0.5.86:8265
nodes=6, CPUs~192, chunks=384
(autoscaler +2s) Tip: use `ray status` to view detailed cluster status. To disable these messages, set RAY_SCHEDULER_EVENTS=0.
(autoscaler +2s) No available node types can fulfill resource requests {'CPU': 1.0}*160. Add suitable node types to this cluster to resolve this issue.
total=2897536, time=5.71s
Shared connection to 52.213.155.130 closed.
As you can see the time taken to run on the cluster was just over 5 seconds. So, five worker nodes ran the same job in less than a third of the time it took on my local PC. Not too shabby.
When you’re finished with your cluster, please run the following Ray command to tear it down.
$ ray down -y ray_test.yaml
As I mentioned before, you should always double-check your account to ensure this command has worked as expected.
Summary
This article, the second in a two-part series, demonstrates how to run CPU-intensive Python code on cloud-based clusters using the Ray library. By spreading the workload across all available vCPUs, Ray ensures our code delivers fast performance and runtimes.
I described and showed how to create a cluster using a YAML file and how to utilise the Ray command-line interface to submit code for execution on the cluster.
Using AWS as an example platform, I took Ray Python code, which had been running on my local PC and ran it — almost unchanged — on a 6-node EC2 cluster. This showed significant performance improvements (3x) over the non-cluster run time.
Finally, I showed how to use the ray command-line tool to tear down the AWS cluster infrastructure Ray had created.
If you haven’t already read my first article in this series, click on the link below to check it out.
Please note that other than being a some-time user of their services, I have no affiliation with AnyScale or AWS or any other organisation mentioned in this article.