Julia 相当于 Python multiprocessing.Pool.map [英] Julia equivalent of Python multiprocessing.Pool.map

查看:23
本文介绍了Julia 相当于 Python multiprocessing.Pool.map的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的多处理需求非常简单:我从事机器学习工作,有时我需要评估多个数据集中的算法,或数据集中的多个算法,等等.我只需要运行一个带有一些参数的函数并获取一个数字.

My multi processing needs are very simple: I work in machine learning, and I sometimes need to evaluate an algorithm in multiple datasets, or multiple algorithms in a dataset, or some such. I just need to run a function with some arguments and get a number.

我不需要 RPC、共享数据,什么都不需要.

I need no RPC, shared data, nothing.

在 Julia 中,我收到以下代码错误:

In Julia, I am getting an error with the following code:

type Model
    param
end

# 1. I have several algorithms/models
models = [Model(i) for i in 1:50]

# 2. I have one dataset
X = rand(50, 5)

# 3. I want to paralelize this function
@everywhere function transform(m)
    sum(X .* m.param)
end

addprocs(3)
println(pmap(transform, models))

我不断收到错误,例如,

I keep getting errors such as,

ERROR: LoadError: On worker 2:
UndefVarError: #transform not defined

另外,有没有办法避免将 @everywhere 放在任何地方?我可以告诉所有变量在创建时都应该复制到工作人员(就像在 Python multiprocessing 中所做的那样)吗?

Also, is there a way to avoid having to place @everywhere everywhere? Can I just tell that all variables should be copied over to the workers when they are created (as is done in Python multiprocessing)?

我的典型代码显然比这复杂得多,模型包含多个文件.

My typical code looks obviously much more complicated than this, with models ranging several files.

作为参考,这是我在 Python 中要做的:

For reference, this is what I would do in Python:

import numpy as np
import time

# 1. I have several algorithms/models
class Model:
    def __init__(self, param):
        self.param = param
models = [Model(i) for i in range(1,51)]

# 2. I have one dataset
X = np.random.random((50, 5))

# 3. I want to paralelize this function
def transform(m):
    return np.sum(X * m.param)

import multiprocessing
pool = multiprocessing.Pool(4)
print(pool.map(transform, models))

推荐答案

核心问题是您需要在尝试对其进行定义之前添加进程.addprocs 应该永远是你做的第一件事,甚至在 using 之前(见下文).这就是为什么当你启动 julia 时,它经常使用 -p 标志来完成.或者使用 ---machinefile <file>带有 -L

Core issues is you need to add the processes before you attempt to define things on them. addprocs should always be the first thing you do, before using even (see below). This is why it is often done with the -p flag when you start julia. Or with a ---machinefile <file> or with a -L <file>

@everywhere 在当前存在的所有进程上执行代码.即在@everywhere 之后添加的进程没有在其上执行代码.

@everywhere exectutes the code on all processes the currently exist. i.e. process added after the @everywhere do not have the code executed on them.

你也错过了一些@everywheres.

Also you missed a few @everywheres.

addprocs(3)

@everywhere type Model
    param
end

# 1. I have several algorithms/models
models = [Model(i) for i in 1:50]

# 2. I have one dataset
@everywhere X = rand(50, 5)

# 3. I want to paralelize this function
@everywhere function transform(m)
    sum(X .* m.param)
end

println(pmap(transform, models))

<小时>

@everywheres 更少的替代方案.

使用一个块发送整个代码块@everywhere


Alternatives with fewer @everywheres.

use a block to send a whole block of code @everywhere

addprocs(3)
@everywhere begin
    type Model
        param
    end

    X = rand(50, 5)

    function transform(m)
        sum(X .* m.param)
    end
end

models = [Model(i) for i in 1:50]

println(pmap(transform, models))

使用局部变量

根据需要发送局部变量(包括函数).虽然这对类型没有帮助.

Use local variables

Local variables (including functions), are sent as required. though this doesn't help for types.

addprocs(3)

@everywhere type Model
    param
end

function main() 
    X = rand(50, 5)

    models = [Model(i) for i in 1:50]

    function transform(m)
        sum(X .* m.param)
    end

    println(pmap(transform, models))
end

main()

使用模块

当您使用 Foo 时,模块 Foo 会加载到所有进程中.但未纳入范围.这有点奇怪和反直觉.如此之多,以至于我无法想出一个可行的例子.但其他人可能会.

Use modules

When you using Foo the module Foo is loaded on all processes. But not brought into scope. It is a bit weird and counter intuitive. So much so that I can't conjure a working example of it. but someone else might.

这篇关于Julia 相当于 Python multiprocessing.Pool.map的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆