我提供迷你批次的scikit学习分类器的迷你批次训练 [英] Mini batch-training of a scikit-learn classifier where I provide the mini batches

查看:88
本文介绍了我提供迷你批次的scikit学习分类器的迷你批次训练的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常大的数据集,无法加载到内存中.

I have a very big dataset that can not be loaded in memory.

我想将此数据集用作scikit-learn分类器的训练集-例如LogisticRegression.

I want to use this dataset as training set of a scikit-learn classifier - for example a LogisticRegression.

是否可以在我提供迷你批次的情况下对scikit学习分类器进行迷你批次训练?

Is there the possibility to perform a mini batch-training of a scikit-learn classifier where I provide the mini batches?

推荐答案

我相信sklearn中的某些分类器具有

I believe that some of the classifiers in sklearn have a partial_fit method. This method allows you to pass minibatches of data to the classifier, such that a gradient descent step is performed for each minibatch. You would simply load a minibatch from disk, pass it to partial_fit, release the minibatch from memory, and repeat.

如果您特别想对Logistic回归进行此操作,则可以使用

If you are particularly interested in doing this for Logistic Regression, then you'll want to use SGDClassifier, which can be set to use logistic regression when loss = 'log'.

您只需将微型批处理的功能和标签传递给partial_fit,就像使用fit一样:

You simply pass the features and labels for your minibatch to partial_fit in the same way that you would use fit:

clf.partial_fit(X_minibatch, y_minibatch)

更新:

我最近遇到了 dask-ml,通过将dask数组与partial_fit组合在一起,可以使此任务非常容易.链接的网页上有一个示例.

I recently came across the dask-ml library which would make this task very easy by combining dask arrays with partial_fit. There is an example on the linked webpage.

这篇关于我提供迷你批次的scikit学习分类器的迷你批次训练的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆