在anaconda python发行版中使用scikit-learn的freeze_support错误? [英] freeze_support bug in using scikit-learn in the Anaconda python distro?

查看:514
本文介绍了在anaconda python发行版中使用scikit-learn的freeze_support错误?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我只想确保这与我的代码无关,但需要在相关的Python包中进行修复. (顺便说一句,这看起来像我甚至可以在供应商发布更新之前就可以手动修补的东西吗?)我使用的是scikit-learn-0.15b1,称为这些.谢谢!

I just want to be sure this is not about my code but it needs to be fixed in the relevant Python package. (By the way, does this look like something I can manually patch even before the vendor ships an update?) I was using scikit-learn-0.15b1 which called these. Thanks!

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Anaconda\lib\multiprocessing\forking.py", line 380, in main
    prepare(preparation_data)
  File "C:\Anaconda\lib\multiprocessing\forking.py", line 495, in prepare
    '__parents_main__', file, path_name, etc
  File "H:\Documents\GitHub\health_wealth\code\controls\lasso\scikit_notreat_predictors.py", line 36, in <module>
    gs.fit(X_train, y_train)
  File "C:\Anaconda\lib\site-packages\sklearn\grid_search.py", line 597, in fit
    return self._fit(X, y, ParameterGrid(self.param_grid))
  File "C:\Anaconda\lib\site-packages\sklearn\grid_search.py", line 379, in _fit
    for parameters in parameter_iterable
  File "C:\Anaconda\lib\site-packages\sklearn\externals\joblib\parallel.py", line 604, in __call__
    self._pool = MemmapingPool(n_jobs, **poolargs)
  File "C:\Anaconda\lib\site-packages\sklearn\externals\joblib\pool.py", line 559, in __init__
    super(MemmapingPool, self).__init__(**poolargs)
  File "C:\Anaconda\lib\site-packages\sklearn\externals\joblib\pool.py", line 400, in __init__
    super(PicklingPool, self).__init__(**poolargs)
  File "C:\Anaconda\lib\multiprocessing\pool.py", line 159, in __init__
    self._repopulate_pool()
  File "C:\Anaconda\lib\multiprocessing\pool.py", line 223, in _repopulate_pool
    w.start()
  File "C:\Anaconda\lib\multiprocessing\process.py", line 130, in start
    self._popen = Popen(self)
  File "C:\Anaconda\lib\multiprocessing\forking.py", line 258, in __init__
    cmd = get_command_line() + [rhandle]
  File "C:\Anaconda\lib\multiprocessing\forking.py", line 358, in get_command_line
    is not going to be frozen to produce a Windows executable.''')
RuntimeError: 
            Attempt to start a new process before the current process
            has finished its bootstrapping phase.

            This probably means that you are on Windows and you have
            forgotten to use the proper idiom in the main module:

                if __name__ == '__main__':
                    freeze_support()
                    ...

            The "freeze_support()" line can be omitted if the program
            is not going to be frozen to produce a Windows executable.

更新:这是我编辑过的脚本,但是在生成GridSearchCV的进程后仍然会导致完全相同的错误.实际上,在命令执行后,相当多的人报告了它将执行多少次折叠和配合,但除此之外,我不知道它何时崩溃.我可以把freeze_support放在其他地方吗?

UPDATE: Here is my edited script, but it still leads to the exact same error after it spawned the processes for GridSearchCV. Actually, quite some after the command reported how many folds and fits it will do, but other than that I don't know when it crashes. Shall I put freeze_support somewhere else?

import scipy as sp
import numpy as np
import pandas as pd
import multiprocessing as mp

if __name__=='__main__':
    mp.freeze_support()

print("Started.")
# n = 10**6
# notreatadapter = iopro.text_adapter('S:/data/controls/notreat.csv', parser='csv')
# X = notreatadapter[1:][0:n]
# y = notreatadapter[0][0:n]
notreatdata = pd.read_stata('S:/data/controls/notreat.dta')
X = notreatdata.iloc[:,1:]
y = notreatdata.iloc[:,0]
n = y.shape[0]

print("Data lodaded.")
from sklearn import cross_validation
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.4, random_state=0)

print("Data split.")
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)  # Don't cheat - fit only on training data
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)  # apply same transformation to test data

print("Data scaled.")
# build a model
from sklearn.linear_model import SGDClassifier
model = SGDClassifier(penalty='elasticnet',n_iter = np.ceil(10**6 / n),shuffle=True)
#model.fit(X,y)

print("CV starts.")
from sklearn import grid_search
# run grid search
param_grid = [{'alpha' : 10.0**-np.arange(1,7),'l1_ratio':[.05, .15, .5, .7, .9, .95, .99, 1]}]
gs = grid_search.GridSearchCV(model,param_grid,n_jobs=8,verbose=1)
gs.fit(X_train, y_train)

print("Scores for alphas:")
print(gs.grid_scores_)
print("Best estimator:")
print(gs.best_estimator_)
print("Best score:")
print(gs.best_score_)
print("Best parameters:")
print(gs.best_params_)

推荐答案

这可能意味着您在Windows上,并且忘记了在主模块中使用正确的习惯用法:

This probably means that you are on Windows and you have forgotten to use the proper idiom in the main module:

if __name__ == '__main__':
    freeze_support()

这篇关于在anaconda python发行版中使用scikit-learn的freeze_support错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆