Scikit-Learn决策树:预测的概率是a还是b? [英] Scikit-Learn Decision Tree: Probability of prediction being a or b?
问题描述
我使用Scikit-Learn有一个基本的决策树分类器:
I have a basic decision tree classifier with Scikit-Learn:
#Used to determine men from women based on height and shoe size
from sklearn import tree
#height and shoe size
X = [[65,9],[67,7],[70,11],[62,6],[60,7],[72,13],[66,10],[67,7.5]]
Y=["male","female","male","female","female","male","male","female"]
#creating a decision tree
clf = tree.DecisionTreeClassifier()
#fitting the data to the tree
clf.fit(X, Y)
#predicting the gender based on a prediction
prediction = clf.predict([68,9])
#print the predicted gender
print(prediction)
当我运行程序时,它总是输出"male"或"female",但是我如何看待预测是男性还是女性的可能性?例如,上面的预测返回男性",但是如何获取预测男性的预测概率呢?
When I run the program, it always outputs either "male" or "female", but how would I be able to see the probability of the prediction being male or female? For example, the prediction above returns "male", but how would I get it to print the probability of the prediction being male?
谢谢!
推荐答案
您可以执行以下操作:
from sklearn import tree
#load data
X = [[65,9],[67,7],[70,11],[62,6],[60,7],[72,13],[66,10],[67,7.5]]
Y=["male","female","male","female","female","male","male","female"]
#build model
clf = tree.DecisionTreeClassifier()
#fit
clf.fit(X, Y)
#predict
prediction = clf.predict([[68,9],[66,9]])
#probabilities
probs = clf.predict_proba([[68,9],[66,9]])
#print the predicted gender
print(prediction)
print(probs)
理论
clf.predict_proba(X)
的结果是:预测类别概率,它是叶子中同一类别的样本的分数.
The result of clf.predict_proba(X)
is: The predicted class probability which is the fraction of samples of the same class in a leaf.
结果解释:
第一个 print
返回 ['male''male']
,因此数据 [[68,9],[66,9]]
被预测为 males
.
The first print
returns ['male' 'male']
so the data [[68,9],[66,9]]
are predicted as males
.
第二个 print
返回:
[[0. 1.][0. 1.]]
这意味着数据被预测为男性,并且由第二列中的数据进行报告.
要查看类的顺序,请使用: clf.classes _
这将返回: ['female','male']
这篇关于Scikit-Learn决策树:预测的概率是a还是b?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!