Scikit Learn RandomForest 内存错误 [英] Scikit Learn RandomForest Memory Error

查看:45
本文介绍了Scikit Learn RandomForest 内存错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在 mnist 手写数字数据集上运行 scikit 学习随机森林算法.在算法训练期间,系统进入内存错误.请告诉我我该怎么做才能解决这个问题.

I am trying to run scikit learn random forest algorithm on the mnist handwritten digits dataset. During the training of the algorithm the system goes into a Memory Error. Please tell me what should I do to fix this issue.

CPU 统计数据: Intel Core 2 Duo,4GB RAM

CPU Statistics: Intel Core 2 Duo with 4GB RAM

数据集的形状是60000, 784.linux终端上的完整错误如下:

The shape of dataset is 60000, 784. the complete error as on the linux terminal is as follows:

> File "./reducer.py", line 53, in <module>
>     main()   File "./reducer.py", line 38, in main
>     clf = clf.fit(data,labels) #training the algorithm   File "/usr/lib/pymodules/python2.7/sklearn/ensemble/forest.py", line 202,
> in fit
>     for i in xrange(n_jobs))   File "/usr/lib/pymodules/python2.7/joblib/parallel.py", line 409, in
> __call__
>     self.dispatch(function, args, kwargs)   File "/usr/lib/pymodules/python2.7/joblib/parallel.py", line 295, in
> dispatch
>     job = ImmediateApply(func, args, kwargs)   File "/usr/lib/pymodules/python2.7/joblib/parallel.py", line 101, in
> __init__
>     self.results = func(*args, **kwargs)   File "/usr/lib/pymodules/python2.7/sklearn/ensemble/forest.py", line 73, in
> _parallel_build_trees
>     sample_mask=sample_mask, X_argsorted=X_argsorted)   File "/usr/lib/pymodules/python2.7/sklearn/tree/tree.py", line 476, in fit
>     X_argsorted=X_argsorted)   File "/usr/lib/pymodules/python2.7/sklearn/tree/tree.py", line 357, in
> _build_tree
>     np.argsort(X.T, axis=1).astype(np.int32).T)   File "/usr/lib/python2.7/dist-packages/numpy/core/fromnumeric.py", line
> 680, in argsort
>     return argsort(axis, kind, order) MemoryError

推荐答案

要么设置 n_jobs=1,要么升级到 scikit-learn 的前沿版本.问题是目前发布的版本使用多个进程并行拟合树,这意味着数据(Xy)需要被复制到这些进程.下一个版本将使用线程而不是进程,因此树学习器共享内存.

Either set n_jobs=1 or upgrade to the bleeding edge version of scikit-learn. The problem is that the currently released version uses multiple processes to fit trees in parallel, which means that the data (X and y) need to be copied to these processes. The next release will use threads instead of processes, so the tree learners share memory.

这篇关于Scikit Learn RandomForest 内存错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆