如何在类内的python中并行化for? [英] How to parallelize a for in python inside a class?

查看:142
本文介绍了如何在类内的python中并行化for?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个python函数funz,每次返回一个长度为p的不同数组时,它都会返回. 我需要在不同的时间运行此函数,然后计算每个值的平均值.

I have a python function funz that returns every time a different array of length p. I need to run this function different times and then to compute the mean of each value.

我可以使用for循环来执行此操作,但是这会花费很多时间.

I can do this with a for loop but it takes a lot of times.

我正在尝试使用库多处理,但是遇到错误.

I am trying to use the library multiprocessing but I get into an error.

import sklearn as sk
import numpy as np
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn import preprocessing,linear_model, cross_validation
from scipy import stats
from multiprocessing import Pool


class stabilize(BaseEstimator,TransformerMixin):

    def __init__(self,sim=3,n_folds=3):
        self.sim=sim
        self.n_folds=n_folds

    def fit(self,X,y):
        self.n,self.p=X.shape
        self.X=X
        self.y=y        
        self.beta=np.zeros(shape=(self.sim,self.p))
        self.alpha_min=[]        
        self.mapper=p.map(self.multiple_cv,[1]*self.sim)    

    def multiple_cv(self,o):
        kf=sk.cross_validation.KFold(self.n,n_folds=self.n_folds,shuffle=True)
        cv=sk.linear_model.LassoCV(cv=kf).fit(self.X,self.y)
        beta=cv.coef_
        alpha_min=cv.alpha_
        return alpha_min

我使用了一个虚拟变量o来告诉我要使用多少个并行进程. 这不是很优雅,可能是错误的一部分. 变量X和y已经是该类的一部分,因此我没有传递给函数multi_cv的参数.

I used a dummy variable o to tell how many parallel process I would like to use. This is not very elegant and maybe is part of the error. The variables X and y are already part of the class so I do not have argument to pass to the function multiple_cv.

运行程序时出现此错误

Exception in thread Thread-3:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 504, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 319, in _handle_tasks
    put(task)
PicklingError: Can't pickle <type 'instancemethod'>: attribute lookup __builtin__.instancemethod failed

推荐答案

您的问题是您要调用的函数是对象的实例方法.无法将其序列化并发送到另一个进程.我看到了两种解决方案:

Your problem is that the function you want to call is a instance method of an object. This can not be serialized and sent to another process. I see two solutions:

  1. 使用其他全局可用功能:

  1. use a different globally available function:

class stabilize(BaseEstimator,TransformerMixin):
    ...
def multiple_cv((self,o)):
    ...

    self.mapper=p.map(self.multiple_cv,[(self, 1)]*self.sim)

  • 使用 VeryPicklableObject 使对象的方法可序列化>及其依赖项.

  • make the methods of objects serializable using VeryPicklableObject and its dependencies.

        @picklableInstancemethod
        def multiple_cv(self, o):
            ...
    

  • 这篇关于如何在类内的python中并行化for?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆