给定属性索引，WEKA生成的模型似乎无法预测类和分布 [英] WEKA-generated models does not seem to predict class and distribution given the attribute index

查看：89 发布时间：2020/10/19 19:21:06 java machine-learning weka decision-tree prediction

本文介绍了给定属性索引，WEKA生成的模型似乎无法预测类和分布的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

概述

我正在使用WEKA API 3.7.10（开发人员版本）来使用预制的 .model 文件。

我制作了25个模型：五个算法的五个结果变量。

J48决策树。

备用决策树

随机森林

LogitBoost

随机子空间

我在J48，随机子空间和随机森林方面遇到问题。

必需的文件

以下是创建后我的数据的 ARFF 表示形式：

  @relationship WekaData 
 
 @attribute ageDiagNum数字
 @attribute raceGroup {黑色，其他，未知，白色} 
 @attribute stage3 {0，I，IIA，IIB，IIIA，IIIB，IIIC，IIINOS，IV，'UNK Stage'} 
 @attribute m3 {M0，M1，MX} 
 @attribute reasonNoCancerSurg { '不是表演med，患者在推荐手术前死亡，不推荐，不推荐，因其他情况而禁忌，推荐但未执行，患者拒绝，推荐但未执行，原因不明，推荐，未知如果已执行，已执行手术，未知；死亡证明或仅尸检的案例'} 
 @attribute ext2 {00,05,10,11,13,14,15,16,17,18,20,21,23,24,25,26,27， 28,30,31,33,34,35,36,37,38,40,50,60,70,80,85,99} 
 @attribute time2 {} 
 @ attribute4 {} 
 @attribute time6 {} 
 @attribute time8 {} 
 @attribute time10 {} 
 
 @data 
 65，White，IIA，MX，'不建议使用，由于其他条件而被禁用'，14，？，？，？，？，？

我需要获取二进制属性 time2 到各自模型的 time10 。

以下是我用来从模型文件的所有文件中获取预测的代码：

 私有静态Map< String ，对象>预测（instances instance，
 Classifier classifier，int attributeIndex）{
 Map< String，Object> map = new LinkedHashMap< String，Object>（）; 
 int instanceIndex = 0; //不改变，等于第1行
 double [] percent = {0}; 
 double resultValue = 0; 
 AbstractOutput abstractOutput = null; 
 
 if（classifier.getClass（）== RandomForest.class || classifier.getClass（）== RandomSubSpace.class）{
 //难以预测time2到time10 
 instance.setClassIndex（5）; 
} else {
 //在LogitBoost和ADTree 
实例中按预期工作。setClassIndex（attributeIndex）; 
} 
 
试试{
 resultValue = classifier.classifyInstance（instances.instance（0））; 
百分比= classifier.distributionForInstance（instances 
 .instance（instanceIndex））; 
} catch（Exception e）{
 e.printStackTrace（）; 
} 
 
 map.put（ Class，resultValue）; 
 
 if（percentage.length> 0）{
 double percentRaw = 0; 
 if（outcomeValue == new Double（1））{
 percentRaw = percent [1]; 
} else {
 percentRaw = 1-percent [0]; 
} 
 map.put（ Percentage，percentRaw）; 
} else {
 //因为J48如果为percent [i]则返回错误，因为它为空
 map.put（ Percentage，new Double（0））; 
} 
 
返回地图； 
}

以下是我的型号用来预测 time2 的结果，因此我们将使用索引6：

  instance.setClassIndex（5）;

ADTree 模型用于 time2 预测

J48 模型用于 time2 预测

RandomForest 模型用于 time2 预测

LogitBoost 模型用于 time2 预测

用于 time2 预测的RandomSubSpace 模型

问题

正如我之前说的， LogitBoost d ADTree 与其他三个方法相比，在这种简单方法中没有问题，因为我遵循了 在Java代码中使用WEKA 教程。

[已解决] 根据我的调整， RandomForest 和 RandomSubSpace 返回
ArrayOutOfBoundsException time2 到 time10 。
```
  java.lang.ArrayIndexOutOfBoundsException：0 
在weka.classifiers.meta.Bagging.distributionForInstance（Bagging.java:586）
在weka.classifiers.trees.RandomForest.distributionForInstance（RandomForest.java ：602）在weka.classifiers.AbstractClass上
 ifier.classifyInstance（AbstractClassifier.java:70）
  
```
堆栈跟踪将根本错误指向该行：
```
  outcomeValue = classifier.classifyInstance（instances.instance（0））; 
  
```
解决方案：我有一些副本，在 ARFF 文件创建期间为二进制变量 time2 到 time10 关于 FastVector< String>（）的值分配给 FastVector< Attribute>（）对象。我的 RandomForest 和 RandomSubSpace 的所有十个模型现在都可以正常工作！

 
    [已解决]    J48 决策树现在有一个新问题。现在不再返回任何错误，而是返回错误：
  java.lang.ArrayIndexOutOfBoundsException：11 
在weka .core.DenseInstance.value（DenseInstance.java:332）
在weka.core.AbstractInstance.isMissing（AbstractInstance.java:315）
在weka.classifiers.trees.j48.C45Split.whichSubset（C45Split .java：494）weka.classifiers.trees.j48.ClassifierTree.getProbs（ClassifierTree.java:670）
 weka.classifiers.trees.j48.ClassifierTree.classifyInstance（ClassifierTree.java:231 ）
在weka.classifiers.trees.J48.classifyInstance（J48.java:266）
  
并跟踪到该行
  outcomeValue = classifier.classifyInstance（instances.instance（0））; 
  
 
 
 
 
  解决方案：实际上，我随机运行了 J48 的程序，它可以工作-给出结果变量和相关的分布。

 
 
 
 
 
 我希望有人可以帮助我解决这个问题。我真的不知道这段代码有什么问题，因为我已经在线检查了Javadocs和示例，并且常量预测仍然持久。
 
 
 （我目前正在检查主代码WEKA GUI的程序，但请在这里帮助我:-)）
解决方案
我只查看了RandomForest问题现在。这是因为Bagging类
从数据实例本身而不是模型中提取不同类的数量。 
您在文本中说time2到time10是二进制的，但是您没有在ARFF文件
中说出来，因此Bagging类不知道有多少个类。 
 
 
因此，您只需要在ARFF文件中指定time2是二进制的，例如：
 @attribute time2 {0,1} 
 
 
 ，您将不会再获得任何异常。
 
 
 我没有研究过J48问题，因为它可能是同一个问题
 
 
 测试代码：
  public static void main （String [] argv）{
 try {
分类器cls =（分类器）weka.core.SerializationHelper.read（ bosom.100k.2.j48.MODEL）; 
 J48 c =（J48）cls; 
 
 DataSource源= new DataSource（ data.arff）; 
实例数据= source.getDataSet（）; 
 data.setClassIndex（6）; 
 
 try {
 double resultValue = c.classifyInstance（data.instance（0））; 
 System.out.println（ outcome + outcomeValue）; 
 double [] p = c.distributionForInstance（data.instance（0））; 
 System.out.println（Arrays.toString（p））; 
} catch（Exception e）{
 e.printStackTrace（）; 
} 
} catch（Exception e）{
 e.printStackTrace（）; 
} 
  
 
Overview

I am using the WEKA API 3.7.10 (developer version) to use my pre-made .model files.

I made 25 models: five outcome variables for five algorithms.


J48 decision tree.
Alternating decision tree
Random forest
LogitBoost
Random subspace


I am having problems with J48, Random subspace and random forest.

Necessary files

The following is the ARFF representation of my data after creation:
@relation WekaData

@attribute ageDiagNum numeric
@attribute raceGroup {Black,Other,Unknown,White}
@attribute stage3 {0,I,IIA,IIB,IIIA,IIIB,IIIC,IIINOS,IV,'UNK Stage'}
@attribute m3 {M0,M1,MX}
@attribute reasonNoCancerSurg {'Not performed, patient died prior to recommended surgery','Not recommended','Not recommended, contraindicated due to other conditions','Recommended but not performed, patient refused','Recommended but not performed, unknown reason','Recommended, unknown if performed','Surgery performed','Unknown; death certificate or autopsy only case'}
@attribute ext2 {00,05,10,11,13,14,15,16,17,18,20,21,23,24,25,26,27,28,30,31,33,34,35,36,37,38,40,50,60,70,80,85,99}
@attribute time2 {}
@attribute time4 {}
@attribute time6 {}
@attribute time8 {}
@attribute time10 {}

@data
65,White,IIA,MX,'Not recommended, contraindicated due to other conditions',14,?,?,?,?,?
I need to get the binary attributes time2 to time10 from their respective models.



Below are snippets of the code I use to get the predictions from all the model files:
private static Map<String, Object> predict(Instances instances,
        Classifier classifier, int attributeIndex) {
    Map<String, Object> map = new LinkedHashMap<String, Object>();
    int instanceIndex = 0; // do not change, equal to row 1
    double[] percentage = { 0 };
    double outcomeValue = 0;
    AbstractOutput abstractOutput = null;

    if(classifier.getClass() == RandomForest.class || classifier.getClass() == RandomSubSpace.class) {
        // has problems predicting time2 to time10
        instances.setClassIndex(5); 
    } else {
        // works as intended in LogitBoost and ADTree
        instances.setClassIndex(attributeIndex);    
    }

    try {
        outcomeValue = classifier.classifyInstance(instances.instance(0));
        percentage = classifier.distributionForInstance(instances
                .instance(instanceIndex));
    } catch (Exception e) {
        e.printStackTrace();
    }

    map.put("Class", outcomeValue);

    if (percentage.length > 0) {
        double percentageRaw = 0;
        if (outcomeValue == new Double(1)) {
            percentageRaw = percentage[1];
        } else {
            percentageRaw = 1 - percentage[0];
        }
        map.put("Percentage", percentageRaw);
    } else {
        // because J48 returns an error if percentage[i] because it's empty
        map.put("Percentage", new Double(0));
    }

    return map;
}




Here are the models I use to predict outcome for time2 hence we will use index 6:
instances.setClassIndex(5); 



ADTree model for time2 prediction
J48 model for time2 prediction
RandomForest model for time2 prediction
LogitBoost model for time2 prediction
RandomSubSpace model for time2 prediction


Problems


As I said before, LogitBoost and ADTree have no problem in this straightforward method compared to the other three, as I followed the "Use WEKA in your Java code" tutorial.
[Solved] Based from my tweakings, RandomForest and RandomSubSpace returns an 
ArrayOutOfBoundsException if told to predict time2 to time10.
java.lang.ArrayIndexOutOfBoundsException: 0
    at weka.classifiers.meta.Bagging.distributionForInstance(Bagging.java:586)
    at weka.classifiers.trees.RandomForest.distributionForInstance(RandomForest.java:602)
    at weka.classifiers.AbstractClassifier.classifyInstance(AbstractClassifier.java:70)
The stack trace points the root error to the line:
outcomeValue = classifier.classifyInstance(instances.instance(0));



  Solution: I had some copy-paste error during the ARFF file creation for the binary variables time2 to time10 regarding FastVector<String>()'s assignment of values to the FastVector<Attribute>() object. All ten of my RandomForest and RandomSubSpace models are working fine right now!

[Solved] J48 decision tree has a new problem now. Instead of not providing any predictions, it now returns an error:
java.lang.ArrayIndexOutOfBoundsException: 11
    at weka.core.DenseInstance.value(DenseInstance.java:332)
    at weka.core.AbstractInstance.isMissing(AbstractInstance.java:315)
    at weka.classifiers.trees.j48.C45Split.whichSubset(C45Split.java:494)
    at weka.classifiers.trees.j48.ClassifierTree.getProbs(ClassifierTree.java:670)
    at weka.classifiers.trees.j48.ClassifierTree.classifyInstance(ClassifierTree.java:231)
    at weka.classifiers.trees.J48.classifyInstance(J48.java:266)
and it traces to the line
outcomeValue = classifier.classifyInstance(instances.instance(0));



  Solution: actually I randomly ran the program with J48 and it worked - giving the outcome variable and associated distributions. 





I hope someone can help me sort out this issue. I really do not know what is wrong with this code as I have checked the Javadocs and examples online and the constant predictions are still persistent.

(I am currently checking the main program for the WEKA GUI but please help me out here :-) )
 解决方案 
I've only looked at the RandomForest problem for now. It is because the Bagging class
extracts the number of different classes from the data instance itself, and not from the model.
You say in your text that time2 to time10 are binary, but you just don't say it in your ARFF file,
and so the Bagging class has no clue about how many classes there are.

So you just have to specify in your ARFF file that time2 is binary, e.g.:
@attribute time2 {0,1}

and you won't get any Exception any more.

I've not looked at the J48 problem, because it may be the same issue with ARFF definition.

Test code:
  public static void main(String [] argv) {
      try {
        Classifier cls = (Classifier) weka.core.SerializationHelper.read("bosom.100k.2.j48.MODEL");
        J48 c = (J48)cls;

        DataSource source = new DataSource("data.arff");
        Instances data = source.getDataSet();
        data.setClassIndex(6);        

        try {
            double outcomeValue = c.classifyInstance(data.instance(0));
            System.out.println("outcome "+outcomeValue);
            double[] p = c.distributionForInstance(data.instance(0));
            System.out.println(Arrays.toString(p));
        } catch (Exception e) {
            e.printStackTrace();
        }
    } catch (Exception e) {
        e.printStackTrace();
    }


                        
这篇关于给定属性索引，WEKA生成的模型似乎无法预测类和分布的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

给定属性索引，WEKA生成的模型似乎无法预测类和分布 [英] WEKA-generated models does not seem to predict class and distribution given the attribute index

问题描述

概述

必需的文件

问题

Overview

Necessary files

Problems

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

给定属性索引，WEKA生成的模型似乎无法预测类和分布 [英] WEKA-generated models does not seem to predict class and distribution given the attribute index

问题描述

概述

必需的文件

问题

Overview

Necessary files

Problems

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭