scitkit SGDClassifierpartial_fit不会逐步学习.返回“类应包含所有有效标签和". [英] scitkit SGDClassifier partial_fit doesnot learn incrementally. Returns “classes should include all valid labels"

查看:77
本文介绍了scitkit SGDClassifierpartial_fit不会逐步学习.返回“类应包含所有有效标签和".的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将两个数据流传递给sgd_clf分类器,如下面的代码所示.首先partial_fit正在获取数据x1,y1的第一流.第二partial_fit正在获取第二个数据流x2,y2.

I passed two streams of data to sgd_clf classifier as shown in below code. First partial_fit is taking first stream of data x1,y1. Second partial_fit is taking the second stream of data x2,y2.

下面的代码在第二个partial_fit步骤中给我错误,该错误指示之前要包含类标签.当我将来自x2 y2的所有数据包括在x1,y1中时,此错误消失了.(我的班级标签已经包含在内,现在才调用第二个partial_fit)

The below code gives me error at second partial_fit step that class lables to be included prior. This error is gone when i include all my data from x2 y2 in x1, y1. (My class labels are included prior to calling second partial_fit now)

但是,我不能事先给出x2 y2数据.如果我将所有数据都放在第一个partial_fit()之前,为什么我需要使用第二个partial_fit()?实际上,如果我以前知道所有数据,则不需要使用partial_fit(),我可以只进行fit().

However, i cannot give this x2 y2 data prior. If at all i give all my data before first partial_fit(), why is there any need for me to use second partial_fit() ? Infact, if i know all data before, i dont need to use partial_fit(), i could just do fit().

from sklearn import neighbors, linear_model
import numpy as np

def train_new_data():

    sgd_clf = linear_model.SGDClassifier()

    x1 = [[8, 9], [20, 22]]
    y1 = [5, 6]

    classes = np.unique(y1)

    #print(classes)

    sgd_clf.partial_fit(x1,y1,classes=classes)

    x2 = [10, 12]
    y2 = 8


    sgd_clf.partial_fit([x2], [y2],classes=classes)#Error here!!

    return sgd_clf

if __name__ == "__main__":

    print(train_new_data().predict([[20,22]]))

问题1:对于sklearn分类器,我对partial_fit()的理解是错误的,因为它按此处指定的方式动态获取数据:增量学习

Q1: Is my understanding of partial_fit() for sklearn classifiers wrong that it takes data on the fly as specified here: Incremental Learning

第二季度:我想重新训练模型/使用新数据更新模型.我不想从头开始训练.可以,partial_fit可以帮助我吗?

Q2: I want to retrain a model/update a model with the new data. I dont want to train from scratch. Will partial_fit help me with this ?

Q3:我不仅只针对SGDClassifier.我可以使用任何支持在线/批处理学习的算法.我的主要目的是第三季度.我有一个训练有素的模型,可以处理1000幅图像.我不想从头开始重新训练该模型,因为我有一个/两个新的图像样本.既没有兴趣为每个新条目创建一个新模型,然后将它们全部混合在一起.这降低了我搜索整个训练过的模型的预测性能.我只想在partial_fit的帮助下将此新数据实例添加到经过训练的模型中.这可行吗?

Q3: I am not specific only to SGDClassifier. I can use any algorithm that support online/batch learning. My main intention is Q3. I have a trained model on 1000's of images. I dont want to retrain this model from scratch just because i have one/two new samples of images. Neither interested in creating a new model for each new entry and then mix all of them. This decreases my performance for predictions to search all over the trained models. I just want to add this new data instances to the trained model with the help of partial_fit. Is this feasible ?

第4季度:如果我无法使用scikit分类器实现第2季度,请指导我如何实现此目标

Q4: If i cannot acheive Q2 with scikit classifiers, Please direct me how i can achieve this

任何建议,想法或参考都将受到赞赏.

Any suggestions or ideas or references are much appreciated.

推荐答案

您需要事先知道需要多少个类.首次调用部分拟合后,算法会假设您以后没有新的 classes 可以添加.

You need to know beforehand how many classes you are going to need. After the first call to partial fit, the algorithm assumes you will not have any new classes to add later.

在您的示例中,您被添加到一个新类(y2 = 8)中,该类从未出现过,并且在最初的部分拟合调用中没有被指示为存在(仅包含标签"5"和"6"").您需要在第一次调用时将其添加到classes对象.

In your example, you are added in a new class (y2 = 8) that has never been seen before and was not indicated as existing in your initial call to partial fit (that only contained the labels "5" and "6"). You need at add it to the classes object on the first call.

出于一致性考虑,我还建议您从0开始对类进行编号.

I would also recommend you number your classes starting from 0 just for consistency's sake.

这篇关于scitkit SGDClassifierpartial_fit不会逐步学习.返回“类应包含所有有效标签和".的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆