在scikit-learn中使用python生成器 [英] Using python generators in scikit-learn

查看：88 发布时间：2020/11/13 3:35:44 python generator scikit-learn random-forest

本文介绍了在scikit-learn中使用python生成器的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想知道是否可以以及如何将python生成器用作scikit-learn分类器的.fit()函数的数据输入?由于海量数据，这对我来说似乎很有意义.

I was wondering whether and how it is possible to use a python generator as data input to scikit-learn classifier's .fit() functions? Due to huge amounts of data, this seems to make sense to me.

尤其是我将要实施随机森林方法.

In particular I am about to implement a random forest approach.

问候 K

推荐答案

答案为否".要对随机森林进行核心学习，您应该

The answer is "no". To do out of core learning with random forests, you should

将数据分成适当大小的批次(受您拥有的RAM数量的限制；越大越好)；
训练单独的随机森林；
将所有基础树一起添加到其中一棵树的estimators_成员中(未试用):

Split your data into reasonably-sized batches (restricted by the amount of RAM you have; bigger is better);
train separate random forests;
append all the underlying trees together in the estimators_ member of one of the trees (untested):

for i in xrange(1, len(forests)):
    forests[0].estimators_.extend(forests[i].estimators_)`

(是的，这很hacky，但是尚未找到解决此问题的方法.请注意，对于非常大的数据集，可能需要抽样一些适合大型计算机RAM的训练示例而不是进行训练另一个选择是使用SGD切换到线性模型，这些模型实现了partial_fit方法，但是显然，它们在可以学习的功能方面受到限制.)

(Yes, this is hacky, but no solution to this problem has been found yet. Note that with very large datasets, it might pay to just sample a number training examples that fits in the RAM of a big machine instead of training on all of it. Another option is to switch to linear models with SGD, those implement a partial_fit method, but obviously they're limited in the kind of functions they can learn.)

这篇关于在scikit-learn中使用python生成器的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在scikit-learn中使用python生成器 [英] Using python generators in scikit-learn

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在scikit-learn中使用python生成器 [英] Using python generators in scikit-learn

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭