Weka预测(百分比置信度)-这是什么意思? [英] Weka prediction (percentage confidence) - what does it mean?

查看:462
本文介绍了Weka预测(百分比置信度)-这是什么意思?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在教自己Weka,并且学习了如何构建模型并从中获取预测(使用CLI进行预测).

I've been teaching myself Weka and have learned how to build models and get predictions out of them (predictions using the CLI).

当我对先前构建的模型中的数据集进行预测时,会得到一列,即预测",也称为每个预测实例的预测置信度.

When I run prediction on a data set from a previously built model I get a column that is the "prediction" also known as prediction confidence for each instance predicted.

我知道置信度百分比是什么意思,但是我的所有预测不应该是我的Weka模型的准确性吗?

I know what percent confidence means but shouldn't all my predictions be the accuracy of my Weka Model?

如果我有一个精度为90%的J48决策树分类器,使用该模型的每个分类实例是否不应该具有90%的预测置信度?

aka if I have a J48 Decision tree classifier with accuracy of 90%, shouldn't every classified instance using this model be 90% prediction confidence?

任何人都知道如何计算此置信度百分比,或者在向其他人介绍我的模型时应如何阅读错误预测和模型准确性?谢谢

Any one know how this percentage confidence is calculated or how I should read the error prediction and model accuracy when telling others about my model? Thanks

推荐答案

基本上,当决策树在数据集上训练时,您通常想停止(或由于缺少功能必须)在每个训练实例上都无法适应之前.发生这种情况时,您将在树中的叶节点上拥有多个训练样本.很多时候,培训标签仍然会混杂在一起(不是所有的正班,也不是所有的负班).

Basically, when a decision tree is training on a dataset, you often want to (or because of missing features have to) stop it before it overfits on every single training instance. When this happens, you will have several training samples at the leaf nodes in the tree. Very often the training labels will still be mixed at that point (not all positive class and not all negative class.)

置信度是某种程度的度量,用于衡量树到该培训实例的叶子时培训标签的一致性.

The confidence is some measure of how consistent the training labels were by the time the tree got down to a leaf for that training instance.

请注意,这还用于以无偏见的方式处理缺失的功能(属性).

note this is also used to handle missing features (attributes) in a clean and unbiased way.

有关此内容的简要说明,请参见此处.

为此还请看Quinlan在决策树上的一些工作.特别是他在C4.5上的工作

Also look at some of Quinlan's work on decision trees for this. Particularly his work on C4.5

也:我知道置信度百分比是什么意思,但我的所有预测不应该是我的Weka模型的准确性吗?"

Also: "I know what percent confidence means but shouldn't all my predictions be the accuracy of my Weka Model?"

不,这是不正确的,某些训练样本比其他训练样本更容易分类,这些分数反映了这一点.

No, this isn't true, some training samples will be more easy to classify than others and these scores reflect this.

这篇关于Weka预测(百分比置信度)-这是什么意思?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆