在anaconda python发行版中使用scikit-learn的freeze_support错误? [英] freeze_support bug in using scikit-learn in the Anaconda python distro?
问题描述
我只想确保这与我的代码无关,但需要在相关的Python包中进行修复. (顺便说一句,这看起来像我甚至可以在供应商发布更新之前就可以手动修补的东西吗?)我使用的是scikit-learn-0.15b1,称为这些.谢谢!
I just want to be sure this is not about my code but it needs to be fixed in the relevant Python package. (By the way, does this look like something I can manually patch even before the vendor ships an update?) I was using scikit-learn-0.15b1 which called these. Thanks!
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Anaconda\lib\multiprocessing\forking.py", line 380, in main
prepare(preparation_data)
File "C:\Anaconda\lib\multiprocessing\forking.py", line 495, in prepare
'__parents_main__', file, path_name, etc
File "H:\Documents\GitHub\health_wealth\code\controls\lasso\scikit_notreat_predictors.py", line 36, in <module>
gs.fit(X_train, y_train)
File "C:\Anaconda\lib\site-packages\sklearn\grid_search.py", line 597, in fit
return self._fit(X, y, ParameterGrid(self.param_grid))
File "C:\Anaconda\lib\site-packages\sklearn\grid_search.py", line 379, in _fit
for parameters in parameter_iterable
File "C:\Anaconda\lib\site-packages\sklearn\externals\joblib\parallel.py", line 604, in __call__
self._pool = MemmapingPool(n_jobs, **poolargs)
File "C:\Anaconda\lib\site-packages\sklearn\externals\joblib\pool.py", line 559, in __init__
super(MemmapingPool, self).__init__(**poolargs)
File "C:\Anaconda\lib\site-packages\sklearn\externals\joblib\pool.py", line 400, in __init__
super(PicklingPool, self).__init__(**poolargs)
File "C:\Anaconda\lib\multiprocessing\pool.py", line 159, in __init__
self._repopulate_pool()
File "C:\Anaconda\lib\multiprocessing\pool.py", line 223, in _repopulate_pool
w.start()
File "C:\Anaconda\lib\multiprocessing\process.py", line 130, in start
self._popen = Popen(self)
File "C:\Anaconda\lib\multiprocessing\forking.py", line 258, in __init__
cmd = get_command_line() + [rhandle]
File "C:\Anaconda\lib\multiprocessing\forking.py", line 358, in get_command_line
is not going to be frozen to produce a Windows executable.''')
RuntimeError:
Attempt to start a new process before the current process
has finished its bootstrapping phase.
This probably means that you are on Windows and you have
forgotten to use the proper idiom in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce a Windows executable.
更新:这是我编辑过的脚本,但是在生成GridSearchCV的进程后仍然会导致完全相同的错误.实际上,在命令执行后,相当多的人报告了它将执行多少次折叠和配合,但除此之外,我不知道它何时崩溃.我可以把freeze_support放在其他地方吗?
UPDATE: Here is my edited script, but it still leads to the exact same error after it spawned the processes for GridSearchCV. Actually, quite some after the command reported how many folds and fits it will do, but other than that I don't know when it crashes. Shall I put freeze_support somewhere else?
import scipy as sp
import numpy as np
import pandas as pd
import multiprocessing as mp
if __name__=='__main__':
mp.freeze_support()
print("Started.")
# n = 10**6
# notreatadapter = iopro.text_adapter('S:/data/controls/notreat.csv', parser='csv')
# X = notreatadapter[1:][0:n]
# y = notreatadapter[0][0:n]
notreatdata = pd.read_stata('S:/data/controls/notreat.dta')
X = notreatdata.iloc[:,1:]
y = notreatdata.iloc[:,0]
n = y.shape[0]
print("Data lodaded.")
from sklearn import cross_validation
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.4, random_state=0)
print("Data split.")
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train) # Don't cheat - fit only on training data
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test) # apply same transformation to test data
print("Data scaled.")
# build a model
from sklearn.linear_model import SGDClassifier
model = SGDClassifier(penalty='elasticnet',n_iter = np.ceil(10**6 / n),shuffle=True)
#model.fit(X,y)
print("CV starts.")
from sklearn import grid_search
# run grid search
param_grid = [{'alpha' : 10.0**-np.arange(1,7),'l1_ratio':[.05, .15, .5, .7, .9, .95, .99, 1]}]
gs = grid_search.GridSearchCV(model,param_grid,n_jobs=8,verbose=1)
gs.fit(X_train, y_train)
print("Scores for alphas:")
print(gs.grid_scores_)
print("Best estimator:")
print(gs.best_estimator_)
print("Best score:")
print(gs.best_score_)
print("Best parameters:")
print(gs.best_params_)
推荐答案
这可能意味着您在Windows上,并且忘记了在主模块中使用正确的习惯用法:
This probably means that you are on Windows and you have forgotten to use the proper idiom in the main module:
if __name__ == '__main__':
freeze_support()
这篇关于在anaconda python发行版中使用scikit-learn的freeze_support错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!