LogisticRegression:未知标签类型:在python中使用sklearn的"continuous" [英] LogisticRegression: Unknown label type: 'continuous' using sklearn in python
问题描述
我有以下代码来测试sklearn python库的一些最流行的ML算法:
I have the following code to test some of most popular ML algorithms of sklearn python library:
import numpy as np
from sklearn import metrics, svm
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
trainingData = np.array([ [2.3, 4.3, 2.5], [1.3, 5.2, 5.2], [3.3, 2.9, 0.8], [3.1, 4.3, 4.0] ])
trainingScores = np.array( [3.4, 7.5, 4.5, 1.6] )
predictionData = np.array([ [2.5, 2.4, 2.7], [2.7, 3.2, 1.2] ])
clf = LinearRegression()
clf.fit(trainingData, trainingScores)
print("LinearRegression")
print(clf.predict(predictionData))
clf = svm.SVR()
clf.fit(trainingData, trainingScores)
print("SVR")
print(clf.predict(predictionData))
clf = LogisticRegression()
clf.fit(trainingData, trainingScores)
print("LogisticRegression")
print(clf.predict(predictionData))
clf = DecisionTreeClassifier()
clf.fit(trainingData, trainingScores)
print("DecisionTreeClassifier")
print(clf.predict(predictionData))
clf = KNeighborsClassifier()
clf.fit(trainingData, trainingScores)
print("KNeighborsClassifier")
print(clf.predict(predictionData))
clf = LinearDiscriminantAnalysis()
clf.fit(trainingData, trainingScores)
print("LinearDiscriminantAnalysis")
print(clf.predict(predictionData))
clf = GaussianNB()
clf.fit(trainingData, trainingScores)
print("GaussianNB")
print(clf.predict(predictionData))
clf = SVC()
clf.fit(trainingData, trainingScores)
print("SVC")
print(clf.predict(predictionData))
前两个工作正常,但是在LogisticRegression
调用中出现以下错误:
The first two works ok, but I got the following error in LogisticRegression
call:
root@ubupc1:/home/ouhma# python stack.py
LinearRegression
[ 15.72023529 6.46666667]
SVR
[ 3.95570063 4.23426243]
Traceback (most recent call last):
File "stack.py", line 28, in <module>
clf.fit(trainingData, trainingScores)
File "/usr/local/lib/python2.7/dist-packages/sklearn/linear_model/logistic.py", line 1174, in fit
check_classification_targets(y)
File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/multiclass.py", line 172, in check_classification_targets
raise ValueError("Unknown label type: %r" % y_type)
ValueError: Unknown label type: 'continuous'
输入数据与前面的调用中的相同,所以这里发生了什么?
The input data is the same as in the previous calls, so what is going on here?
顺便说一句,为什么在LinearRegression()
和SVR()
算法(15.72 vs 3.95)
的首次预测中会有很大差异?
And by the way, why there is a huge diference in the first prediction of LinearRegression()
and SVR()
algorithms (15.72 vs 3.95)
?
推荐答案
您正在将浮点数传递给分类器,该分类器期望将分类值作为目标向量.如果将其转换为int
,它将被接受为输入(尽管这样做是否正确还是值得怀疑的).
You are passing floats to a classifier which expects categorical values as the target vector. If you convert it to int
it will be accepted as input (although it will be questionable if that's the right way to do it).
使用scikit的 labelEncoder
功能.
It would be better to convert your training scores by using scikit's labelEncoder
function.
DecisionTree和KNeighbors限定词也是如此.
The same is true for your DecisionTree and KNeighbors qualifier.
from sklearn import preprocessing
from sklearn import utils
lab_enc = preprocessing.LabelEncoder()
encoded = lab_enc.fit_transform(trainingScores)
>>> array([1, 3, 2, 0], dtype=int64)
print(utils.multiclass.type_of_target(trainingScores))
>>> continuous
print(utils.multiclass.type_of_target(trainingScores.astype('int')))
>>> multiclass
print(utils.multiclass.type_of_target(encoded))
>>> multiclass
这篇关于LogisticRegression:未知标签类型:在python中使用sklearn的"continuous"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!