How to implement distributed computing with Python?

Posted by dan90joe on Sun, 26 Dec 2021 09:20:00 +0100

In the face of computing intensive tasks, in addition to multi process, it is distributed computing. How to use Python to realize distributed computing? A very simple way to share today is to use Ray.

What is Ray

Ray is a distributed computing framework based on python. It adopts the dynamic graph computing model and provides a simple and general API to create distributed applications. It is very convenient to use. You can use decorators to easily realize distributed computing by modifying very little code, so that Python code originally running on a single machine can easily realize distributed computing. At present, it is mostly used for machine learning.

Ray features:

1. Provides simple primitives for building and running distributed applications.

2. Enables users to parallelize stand-alone code with little or no code changes.

3. Ray Core includes a large ecosystem of applications, libraries, and tools to support complex applications. For example, Tune, RLlib, RaySGD, Serve, Datasets, and Workflows.

Installing Ray

The easiest way to install the official version:

pip install -U ray
pip install 'ray[default]'

For Windows systems, Visual C++ runtime must be installed

See official documents for other installation methods.

Using Ray

One decorator can handle distributed computing:

import ray
ray.init()

@ray.remote
def f(x):
    return x * x

futures = [f.remote(i) for i in range(4)]
print(ray.get(futures)) # [0, 1, 4, 9]

Execute ray first Init(), and then add a decorator @ ray. Before the function to execute the distributed task Remote implements distributed computing. Decorator @ ray Remote can also decorate a class:

import ray
ray.init()

@ray.remote
class Counter(object):
    def __init__(self):
        self.n = 0

    def increment(self):
        self.n += 1

    def read(self):
        return self.n

counters = [Counter.remote() for i in range(4)]
tmp1 = [c.increment.remote() for c in counters]
tmp2 = [c.increment.remote() for c in counters]
tmp3 = [c.increment.remote() for c in counters]
futures = [c.read.remote() for c in counters]
print(ray.get(futures)) # [3, 3, 3, 3]

Of course, the above distributed computing is still carried out on their own computers, but in a distributed form. During the execution of the program, you can enter http://127.0.0.1:8265/#/ To view the execution of distributed tasks:

So how to implement Ray cluster computing? Then look down.

Using Ray clustering

One of Ray's advantages is the ability to utilize multiple machines in the same program. Of course, ray can run on one machine, because usually you have only one machine. But the real power is to use ray on a set of machines.

The Ray cluster consists of a head node and a set of working nodes. You need to start the head node first and give the worker node the head node address to form a cluster:

You can use the Ray Cluster Launcher to configure the machine and start a multi node Ray cluster. You can use cluster initiators on AWS, GCP, Azure, Kubernetes, alicloud, on premises and Staroid, and even on your custom node provider.

Ray clusters can also leverage Ray Autoscaler, which allows ray to interact with cloud providers to request or publish instances according to specifications and application workloads.

Now, let's quickly demonstrate the function of Ray cluster. Here, we use Docker to start two Ubuntu containers to simulate the cluster:

  • Environment 1: 172.17 0.2 as head node
  • Environment 2: 172.17 0.3 as a worker node, there can be multiple worker nodes

Specific steps:

1. Download ubuntu image

docker pull ubuntu

2. Start the ubuntu container and install the dependency

Start first

docker run -it --name ubuntu-01 ubuntu bash

Start second

docker run -it --name ubuntu-02 ubuntu bash

Check their IP addresses:

$ docker inspect -f "{{ .NetworkSettings.IPAddress }}" ubuntu-01
172.17.0.2
$ docker inspect -f "{{ .NetworkSettings.IPAddress }}" ubuntu-02
172.17.0.3

Then install python, pip and ray inside the container

apt update && apt install python3 
apt install python3-pip
pip3 install ray

3. Start the head node and the worker node

Select one of the containers as the head node. Here, select 172.17 0.2, execution:

ray start --head --node-ip-address 172.17.0.2

The default port is 6379. You can use the -- port parameter to modify the default port. The results after startup are as follows:

Ignoring the warning, you can see that a prompt is given. If you want to bind other nodes to the head, you can do the following:

ray start --address='172.17.0.2:6379' --redis-password='5241590000000000'

Execute the above command on another node to start the worker node:

To close:

ray stop

4. Perform tasks

Select any node and execute the following script to modify ray Parameters of init() function:

from collections import Counter
import socket
import time

import ray

ray.init(address='172.17.0.2:6379', _redis_password='5241590000000000')

print('''This cluster consists o    f
    {} nodes in total
    {} CPU resources in total
'''.format(len(ray.nodes()), ray.cluster_resources()['CPU']))

@ray.remote
def f():
    time.sleep(0.001)
    # Return IP address.
    return socket.gethostbyname(socket.gethostname())

object_ids = [f.remote() for _ in range(10000)]
ip_addresses = ray.get(object_ids)

print('Tasks executed')
for ip_address, num_tasks in Counter(ip_addresses).items():
    print('    {} tasks on {}'.format(num_tasks, ip_address))

The results are as follows:

You can see 172.17 0.2 performed 4751 tasks, 172.17 0.3 implements 5249 tasks and realizes the effect of distributed computing.

Last words

With ray, you can implement parallel computing without using Python's multi process. Today's machine learning is mainly a computing intensive task. Without the help of distributed computing, the speed will be very slow. Ray provides a simple solution to realize distributed computing.

Topics: Python