SciKit学习中的多类Logistic回归 [英] Multi-Class Logistic Regression in SciKit Learn

查看:128
本文介绍了SciKit学习中的多类Logistic回归的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于多类案例,我无法正确调用Scikit的Logistic回归.我正在使用lbgfs求解器,并且确实将multi_class参数设置为多项式.

I am having trouble with the proper call of Scikit's Logistic Regression for the multi-class case. I am using the lbgfs solver, and I do have the multi_class parameter set to multinomial.

我不清楚如何在拟合模型时传递真实的类标签.我以为它与随机森林分类器多类相似/相同,您可以在其中传递[n_samples,m_classes]数据帧.但是,这样做时,我得到一个错误,即数据的形状不好. ValueError:错误的输入形状(20,5)-在这个小例子中,有5个类,20个样本.

It is unclear to me how to pass the true class labels in fitting the model. I had assumed that it was similar/same as for the random forest classifier multi-class, where you pass [n_samples, m_classes] dataframe. However, in doing this, I get an error that the data is of a bad shape. ValueError: bad input shape (20, 5) -- in this tiny example, there were 5 classes, 20 samples.

经检查,fit方法的文档说,真值以[n_samples,]形式传递-与我遇到的错误相匹配-但是,我不知道如何训练多个类.所以,这是我的问题:如何将全套类标签传递给fit函数?

On inspection, the documentation for the fit method says that the truth values are passed as [n_samples, ] -- which matches the error i'm getting -- however, I have no idea then how to train the model with multiple classes. So, this is my question: how do i pass the full set of class labels to the fit function?

我一直无法在Internet上找到用于建模的示例代码,也无法在StackOverflow上找到此问题..但是我感到某些人必须知道该怎么做!

i've been unable to find sample code on the Internet to model, nor this question on StackOverflow.. but i feel certain someone must know how to do it!

在下面的代码中,train_features = [n_samples,nn_features],true_train = [n_samples,m_classes]

in the code below, train_features = [n_samples, nn_features], truth_train = [n_samples, m_classes]

clf = LogisticRegressionCV(class_weight='balanced', multi_class='multinomial', solver='lbfgs')
clf.fit(train_features, truth_train)
pred = clf.predict(test_features)

推荐答案

您似乎在混淆 multiclass multilabel 这些术语

You seem to be confusing terms multiclass and multilabel http://scikit-learn.org/stable/modules/multiclass.html , in short:

  • 多类分类是指具有超过 两节课;例如,对一组水果图像进行分类, 橘子,苹果或梨.多类分类使 假设每个样本都分配给一个且只有一个标签: 水果可以是苹果或梨,但不能同时是两者.
  • Multiclass classification means a classification task with more than two classes; e.g., classify a set of images of fruits which may be oranges, apples, or pears. Multiclass classification makes the assumption that each sample is assigned to one and only one label: a fruit can be either an apple or a pear but not both at the same time.

因此数据为[n_samples, n_features],标签为[n_samples]

  • 多标签分类为每个样本分配了一组目标 标签.可以将其视为预测数据点的属性 不互斥的主题,例如与 一个文件.文字可能涉及宗教,政治,金融中的任何一个 或同时接受教育或全部都不接受.
  • Multilabel classification assigns to each sample a set of target labels. This can be thought as predicting properties of a data-point that are not mutually exclusive, such as topics that are relevant for a document. A text might be about any of religion, politics, finance or education at the same time or none of these.

因此数据为[n_samples, n_features],标签为[n_samples, n_labels]

您似乎正在寻找multilabel(因为multiclass标签应为1-dim).当前,在sklearn中,唯一支持多标签的方法是:决策树,随机森林,最近邻居,岭回归.

And you seem to be looking for multilabel (as for multiclass labels should be 1-dim). Currently, in sklearn, the only methods supporting multilabel are: Decision Trees, Random Forests, Nearest Neighbors, Ridge Regression.

如果您想了解不同模型的多标签问题,只需将OneVsRestClassifier用作LogisticRegression周围的多标签包装器

If you want to learn multlabel problem with diffent model, simply use OneVsRestClassifier as a multilabel wrapper around your LogisticRegression

http://scikit- Learn.org/stable/modules/generated/sklearn.multiclass.OneVsRestClassifier.html#sklearn.multiclass.OneVsRestClassifier

这篇关于SciKit学习中的多类Logistic回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆