如何正确运行ray? [英] How to run ray correctly?

查看:181
本文介绍了如何正确运行ray?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

试图了解如何正确使用ray进行编程.

Trying to understand how to correctly program with ray.

以下结果似乎与ray的性能改进不一致,如此处.

The results below do not seem to agree with the performance improvement of ray as explained here.

环境:

  • Python版本:3.6.10
  • ray版本:0.7.4

这是机器规格:

>>> import psutil
>>> psutil.cpu_count(logical=False)
4
>>> psutil.cpu_count(logical=True)
8
>>> mem = psutil.virtual_memory()
>>> mem.total
33707012096 # 32 GB

首先,使用Queue(multiproc_function.py)进行传统的python多重处理:

First, the traditional python multiprocessing with Queue (multiproc_function.py):

import time
from multiprocessing import Process, Queue

N_PARALLEL = 8
N_LIST_ITEMS = int(1e8)

def loop(n, nums, q):
    print(f"n = {n}")
    s = 0
    start = time.perf_counter()
    for e in nums:
        s += e
    t_taken = round(time.perf_counter() - start, 2)
    q.put((n, s, t_taken))

if __name__ == '__main__':
    results = []

    nums = list(range(N_LIST_ITEMS))

    q = Queue()

    procs = []
    for i in range(N_PARALLEL):
        procs.append(Process(target=loop, args=(i, nums, q)))

    for proc in procs:
        proc.start()

    for proc in procs:
        n, s, t_taken = q.get()
        results.append((n, s, t_taken))

    for proc in procs:
        proc.join()

    for r in results:
        print(r)

结果是:

$ time python multiproc_function.py
n = 0
n = 1
n = 2
n = 3
n = 4
n = 5
n = 6
n = 7
(0, 4999999950000000, 11.12)
(1, 4999999950000000, 11.14)
(2, 4999999950000000, 11.1)
(3, 4999999950000000, 11.23)
(4, 4999999950000000, 11.2)
(6, 4999999950000000, 11.22)
(7, 4999999950000000, 11.24)
(5, 4999999950000000, 11.54)

real    0m19.156s
user    1m13.614s
sys     0m24.496s

在运行期间检查htop时,内存从2.6 GB的基本消耗增加到8 GB,并且所有8个处理器都被完全消耗.另外,从user+sys> real可以清楚地看到并行处理正在发生.

When inspecting htop during the run, the memory went from 2.6 GB base consumption to 8 GB, and had all the 8 processors fully consumed. Also, it is clear from user+sys > real that parallel processing is happening.

这是射线测试代码(ray_test.py):

Here is the ray test code (ray_test.py):

import time
import psutil
import ray

N_PARALLEL = 8
N_LIST_ITEMS = int(1e8)

use_logical_cores = False
num_cpus = psutil.cpu_count(logical=use_logical_cores)
if use_logical_cores:
    print(f"Setting num_cpus to # logical cores  = {num_cpus}")
else:
    print(f"Setting num_cpus to # physical cores = {num_cpus}")
ray.init(num_cpus=num_cpus)

@ray.remote
def loop(nums, n):
    print(f"n = {n}")
    s = 0
    start = time.perf_counter()
    for e in nums:
        s += e
    t_taken = round(time.perf_counter() - start, 2)
    return (n, s, t_taken)

if __name__ == '__main__':
    nums = list(range(N_LIST_ITEMS))
    list_id = ray.put(nums)
    results = ray.get([loop.remote(list_id, i) for i in range(N_PARALLEL)])
    for r in results:
        print(r)

结果是:

$ time python ray_test.py
Setting num_cpus to # physical cores = 4
2020-04-28 16:52:51,419 INFO resource_spec.py:205 -- Starting Ray with 18.16 GiB memory available for workers and up to 9.11 GiB for objects. You can adjust these settings with ray.remote(memory=<bytes>, object_store_memory=<bytes>).
(pid=78483) n = 2
(pid=78485) n = 1
(pid=78484) n = 3
(pid=78486) n = 0
(pid=78484) n = 4
(pid=78483) n = 5
(pid=78485) n = 6
(pid=78486) n = 7
(0, 4999999950000000, 5.12)
(1, 4999999950000000, 5.02)
(2, 4999999950000000, 4.8)
(3, 4999999950000000, 4.43)
(4, 4999999950000000, 4.64)
(5, 4999999950000000, 4.61)
(6, 4999999950000000, 4.84)
(7, 4999999950000000, 4.99)

real    0m45.082s
user    0m22.163s
sys     0m10.213s

real时间比python多处理时间长得多.此外,real大于user+sys.检查htop时,内存高达30 GB,并且内核也没有完全饱和.所有这些似乎与ray应该做什么相反.

The real time is much longer than that of python multiprocessing. Also, real is greater than user+sys. When inspecting htop, the memory went up to 30 GB and the cores were also not fully saturated. All these seem to contradict what ray is supposed to do.

然后将use_logical_cores设置为True.由于内存不足,运行被终止:

Then I set use_logical_cores to True. The run gets killed due to out of memory:

$ time python ray_test.py
Setting num_cpus to # logical cores  = 8
2020-04-28 16:27:43,709 INFO resource_spec.py:205 -- Starting Ray with 17.29 GiB memory available for workers and up to 8.65 GiB for objects. You can adjust these settings with ray.remote(memory=<bytes>, object_store_memory=<bytes>).
Killed

real    0m25.205s
user    0m15.056s
sys     0m4.028s

我在这里做错什么了吗?

Am I doing something wrong here?

推荐答案

首先,Ray不保证CPU亲和力或资源隔离.这可能是其CPU使用率不饱和的原因. (不过,我不确定100%).您可以尝试使用psutil设置cpu亲和力,然后查看核心是否仍未饱和. ( https://psutil.readthedocs.io/en/latest/#psutil .Process.cpu_affinity ).

Firstly, Ray doesn't guarantee the CPU affinity or resource isolation. That could be the reason why it have non-saturated CPU usage. (I am not 100% sure though). You can try setting cpu affinity using psutil and see if the cores are still not saturated. (https://psutil.readthedocs.io/en/latest/#psutil.Process.cpu_affinity).

关于结果,您介意尝试使用最新版本的Ray吗?在性能和性能方面取得了相当不错的进步从0.7.4版本开始的Ray中的内存管理.

About the result, would you mind trying the newest version of Ray? There was pretty good progress in performance & memory management in Ray from the version 0.7.4.

这篇关于如何正确运行ray?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆