Scikit Learn RandomForest 内存错误 [英] Scikit Learn RandomForest Memory Error
问题描述
我正在尝试在 mnist 手写数字数据集上运行 scikit 学习随机森林算法.在算法训练期间,系统进入内存错误.请告诉我我该怎么做才能解决这个问题.
I am trying to run scikit learn random forest algorithm on the mnist handwritten digits dataset. During the training of the algorithm the system goes into a Memory Error. Please tell me what should I do to fix this issue.
CPU 统计数据: Intel Core 2 Duo,4GB RAM
CPU Statistics: Intel Core 2 Duo with 4GB RAM
数据集的形状是60000, 784.linux终端上的完整错误如下:
The shape of dataset is 60000, 784. the complete error as on the linux terminal is as follows:
> File "./reducer.py", line 53, in <module>
> main() File "./reducer.py", line 38, in main
> clf = clf.fit(data,labels) #training the algorithm File "/usr/lib/pymodules/python2.7/sklearn/ensemble/forest.py", line 202,
> in fit
> for i in xrange(n_jobs)) File "/usr/lib/pymodules/python2.7/joblib/parallel.py", line 409, in
> __call__
> self.dispatch(function, args, kwargs) File "/usr/lib/pymodules/python2.7/joblib/parallel.py", line 295, in
> dispatch
> job = ImmediateApply(func, args, kwargs) File "/usr/lib/pymodules/python2.7/joblib/parallel.py", line 101, in
> __init__
> self.results = func(*args, **kwargs) File "/usr/lib/pymodules/python2.7/sklearn/ensemble/forest.py", line 73, in
> _parallel_build_trees
> sample_mask=sample_mask, X_argsorted=X_argsorted) File "/usr/lib/pymodules/python2.7/sklearn/tree/tree.py", line 476, in fit
> X_argsorted=X_argsorted) File "/usr/lib/pymodules/python2.7/sklearn/tree/tree.py", line 357, in
> _build_tree
> np.argsort(X.T, axis=1).astype(np.int32).T) File "/usr/lib/python2.7/dist-packages/numpy/core/fromnumeric.py", line
> 680, in argsort
> return argsort(axis, kind, order) MemoryError
推荐答案
要么设置 n_jobs=1
,要么升级到 scikit-learn 的前沿版本.问题是目前发布的版本使用多个进程并行拟合树,这意味着数据(X
和y
)需要被复制到这些进程.下一个版本将使用线程而不是进程,因此树学习器共享内存.
Either set n_jobs=1
or upgrade to the bleeding edge version of scikit-learn. The problem is that the currently released version uses multiple processes to fit trees in parallel, which means that the data (X
and y
) need to be copied to these processes. The next release will use threads instead of processes, so the tree learners share memory.
这篇关于Scikit Learn RandomForest 内存错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!