如何解释Weka Logistic回归输出? [英] How to interpret Weka Logistic Regression output?

查看:626
本文介绍了如何解释Weka Logistic回归输出?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请帮助解释Weka库中weka.classifiers.functions.Logistic产生的逻辑回归结果.

Please help interpret results of logistic regression produced by weka.classifiers.functions.Logistic from Weka library.

我使用来自Weka示例的数字数据:

I use numeric data from Weka examples:

@relation weather

@attribute outlook {sunny, overcast, rainy}
@attribute temperature real
@attribute humidity real
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}

@data
sunny,85,85,FALSE,no
sunny,80,90,TRUE,no
overcast,83,86,FALSE,yes
rainy,70,96,FALSE,yes
rainy,68,80,FALSE,yes
rainy,65,70,TRUE,no
overcast,64,65,TRUE,yes
sunny,72,95,FALSE,no
sunny,69,70,FALSE,yes
rainy,75,80,FALSE,yes
sunny,75,70,TRUE,yes
overcast,72,90,TRUE,yes
overcast,81,75,FALSE,yes
rainy,71,91,TRUE,no

要创建逻辑回归模型,请使用以下命令: java -cp $ WEKA_INS/weka.jar weka.classifiers.functions.Logistic -t $ WEKA_INS/data/weather.numeric.arff -T $ WEKA_INS/data/weather.numeric.arff -d ./weather.numeric.model. arff

To create logistic regression model I use command: java -cp $WEKA_INS/weka.jar weka.classifiers.functions.Logistic -t $WEKA_INS/data/weather.numeric.arff -T $WEKA_INS/data/weather.numeric.arff -d ./weather.numeric.model.arff

这三个参数的意思是:

-t <name of training file> : Sets training file.
-T <name of test file> : Sets test file. 
-d <name of output file> : Sets model output file.

运行以上命令将产生以下输出:

Running the above command produce the following output:

Logistic Regression with ridge parameter of 1.0E-8
Coefficients...
              Class
Variable                    yes
===============================
outlook=sunny           -6.4257
outlook=overcast        13.5922
outlook=rainy           -5.6562
temperature             -0.0776
humidity                -0.1556
windy                    3.7317
Intercept                22.234

Odds Ratios...
              Class
Variable                    yes
===============================
outlook=sunny            0.0016
outlook=overcast    799848.4264
outlook=rainy            0.0035
temperature              0.9254
humidity                 0.8559
windy                   41.7508


Time taken to build model: 0.05 seconds
Time taken to test model on training data: 0 seconds

=== Error on training data ===
Correctly Classified Instances          11               78.5714 %
Incorrectly Classified Instances         3               21.4286 %
Kappa statistic                          0.5532
Mean absolute error                      0.2066
Root mean squared error                  0.3273
Relative absolute error                 44.4963 %
Root relative squared error             68.2597 %
Total Number of Instances               14     

=== Confusion Matrix ===
 a b   <-- classified as
 7 2 | a = yes
 1 4 | b = no

问题:

1)报告的第一部分:

1) First section of the report:

Coefficients...
              Class
Variable                    yes
===============================
outlook=sunny           -6.4257
outlook=overcast        13.5922
outlook=rainy           -5.6562
temperature             -0.0776
humidity                -0.1556
windy                    3.7317
Intercept                22.234

1.1)我是否正确理解系数"实际上是应用于每个属性的权重 在将它们加在一起以产生等于"yes"的类属性"play"的值之前?

1.1) Do I understand right that "Coefficients" are in fact weights that are applied to each attribute before adding them together to produce the value of class attribute "play" equal to " yes"?

2)报告的第二部分:

2) Second section of the report:

Odds Ratios...
              Class
Variable                    yes
===============================
outlook=sunny            0.0016
outlook=overcast    799848.4264
outlook=rainy            0.0035
temperature              0.9254
humidity                 0.8559
windy                   41.7508

2.1)赔率"的含义是什么? 2.2)它们是否还与类属性"play"(等于"yes")相关? 2.3)为什么"outlook = overcast"的值比"outlook = sunny"的值大得多?

2.1) What is the meaning of "Odds Ratios"? 2.2) Do they all also relate to class attribute "play" equal to " yes"? 2.3) Why value of "outlook=overcast" is so much bigger then value of "outlook=sunny"?

3)

=== Confusion Matrix ===
 a b   <-- classified as
 7 2 | a = yes
 1 4 | b = no

3.1)什么是混淆矩阵的法宝?

3.1) What is the menaing of Confusion Matrix?

非常感谢您的帮助!

推荐答案

问题:

  1. 从下面的注释中更新:实际上,系数是应用于每个属性的权重,这些权重被插入到逻辑函数1/(1 + exp(-weighted_sum))中.获得概率.请注意,在将它们相加之前,"Intercept"值将被添加到总和中,而不会与您的任何变量相乘. 结果是新实例属于类yes的概率(> 0.5表示yes).

  1. Updated from comment below: The coefficients are in fact the weights that are applied to each attribute which are plugged into the logistic function 1/(1+exp(-weighted_sum)) to obtain probabilities. Note that the "Intercept" value is added to the sum without multiplying by any of your variables before adding them together. The result is the probability that the new instance belongs to class yes (> 0.5 means yes).

优势比表示该值的更改(或该值的更改)对预测的影响有多大.我认为 链接 可以起到很大的作用解释赔率的工作. Outlook = overcast的价值如此之大,因为如果前景是阴暗的,则赔率非常高,比赛将等于是.

The odds ratios indicate how large of an influence a change in that value (or change to that value) will have on the prediction. I think this link does a great job explaining the odds ratios. The value of outlook=overcast is so large because if the outlook is overcast the odds are very good that play will equal yes.

混淆矩阵仅向您显示正确和错误分类了多少个测试数据点.在您的示例中,实际上7个A被分类为A,而2个A被错误分类为B.在此问题中,您的问题得到了更彻底的解答:

The confusion matrix simply shows you how many of the test data points are correctly and incorrectly classified. In your example 7 A's were actually classified as A, whereas 2 A's were misclassified as B. Your question is more thoroughly answered in this question: How to read the classifier confusion matrix in WEKA.

这篇关于如何解释Weka Logistic回归输出?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆