给定属性索引,WEKA生成的模型似乎无法预测类和分布 [英] WEKA-generated models does not seem to predict class and distribution given the attribute index

查看:89
本文介绍了给定属性索引,WEKA生成的模型似乎无法预测类和分布的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

概述



我正在使用WEKA API 3.7.10(开发人员版本)来使用预制的 .model 文件。



我制作了25个模型:五个算法的五个结果变量。




  • J48决策树
  • >
  • 备用决策树

  • 随机森林

  • LogitBoost

  • 随机子空间



我在J48,随机子空间和随机森林方面遇到问题。



必需的文件



以下是创建后我的数据的 ARFF 表示形式:

  @relationship WekaData 

@attribute ageDiagNum数字
@attribute raceGroup {黑色,其他,未知,白色}
@attribute stage3 {0,I,IIA,IIB,IIIA,IIIB,IIIC,IIINOS,IV,'UNK Stage'}
@attribute m3 {M0,M1,MX}
@attribute reasonNoCancerSurg { '不是表演med,患者在推荐手术前死亡,不推荐,不推荐,因其他情况而禁忌,推荐但未执行,患者拒绝,推荐但未执行,原因不明,推荐,未知如果已执行,已执行手术,未知;死亡证明或仅尸检的案例'}
@attribute ext2 {00,05,10,11,13,14,15,16,17,18,20,21,23,24,25,26,27, 28,30,31,33,34,35,36,37,38,40,50,60,70,80,85,99}
@attribute time2 {}
@ attribute4 {}
@attribute time6 {}
@attribute time8 {}
@attribute time10 {}

@data
65,White,IIA,MX,'不建议使用,由于其他条件而被禁用',14,?,?,?,?,?

我需要获取二进制属性 time2 到各自模型的 time10






以下是我用来从模型文件的所有文件中获取预测的代码:

 私有静态Map< String ,对象>预测(instances instance,
Classifier classifier,int attributeIndex){
Map< String,Object> map = new LinkedHashMap< String,Object>();
int instanceIndex = 0; //不改变,等于第1行
double [] percent = {0};
double resultValue = 0;
AbstractOutput abstractOutput = null;

if(classifier.getClass()== RandomForest.class || classifier.getClass()== RandomSubSpace.class){
//难以预测time2到time10
instance.setClassIndex(5);
} else {
//在LogitBoost和ADTree
实例中按预期工作。setClassIndex(attributeIndex);
}

试试{
resultValue = classifier.classifyInstance(instances.instance(0));
百分比= classifier.distributionForInstance(instances
.instance(instanceIndex));
} catch(Exception e){
e.printStackTrace();
}

map.put( Class,resultValue);

if(percentage.length> 0){
double percentRaw = 0;
if(outcomeValue == new Double(1)){
percentRaw = percent [1];
} else {
percentRaw = 1-percent [0];
}
map.put( Percentage,percentRaw);
} else {
//因为J48如果为percent [i]则返回错误,因为它为空
map.put( Percentage,new Double(0));
}

返回地图;
}






以下是我的型号用来预测 time2 的结果,因此我们将使用索引6:

  instance.setClassIndex(5); 





问题




  • 正如我之前说的, LogitBoost d ADTree 与其他三个方法相比,在这种简单方法中没有问题,因为我遵循了 在Java代码中使用WEKA 教程。


  • [已解决] 根据我的调整, RandomForest RandomSubSpace 返回
    ArrayOutOfBoundsException time2 time10

      java.lang.ArrayIndexOutOfBoundsException:0 
    在weka.classifiers.meta.Bagging.distributionForInstance(Bagging.java:586)
    在weka.classifiers.trees.RandomForest.distributionForInstance(RandomForest.java :602)在weka.classifiers.AbstractClass上
    ifier.classifyInstance(AbstractClassifier.java:70)

    堆栈跟踪将根本错误指向该行:

      outcomeValue = classifier.classifyInstance(instances.instance(0)); 




    解决方案:我有一些副本,在 ARFF 文件创建期间为二进制变量 time2 time10 关于 FastVector< String>()的值分配给 FastVector< Attribute>()对象。我的 RandomForest RandomSubSpace 的所有十个模型现在都可以正常工作!



  • [已解决] J48 决策树现在有一个新问题。现在不再返回任何错误,而是返回错误:

      java.lang.ArrayIndexOutOfBoundsException:11 
    在weka .core.DenseInstance.value(DenseInstance.java:332)
    在weka.core.AbstractInstance.isMissing(AbstractInstance.java:315)
    在weka.classifiers.trees.j48.C45Split.whichSubset(C45Split .java:494)weka.classifiers.trees.j48.ClassifierTree.getProbs(ClassifierTree.java:670)
    weka.classifiers.trees.j48.ClassifierTree.classifyInstance(ClassifierTree.java:231 )
    在weka.classifiers.trees.J48.classifyInstance(J48.java:266)

    并跟踪到该行

      outcomeValue = classifier.classifyInstance(instances.instance(0)); 




    解决方案:实际上,我随机运行了 J48 的程序,它可以工作-给出结果变量和相关的分布。








我希望有人可以帮助我解决这个问题。我真的不知道这段代码有什么问题,因为我已经在线检查了Javadocs和示例,并且常量预测仍然持久。



(我目前正在检查主代码WEKA GUI的程序,但请在这里帮助我:-))

解决方案

我只查看了RandomForest问题现在。这是因为Bagging类
从数据实例本身而不是模型中提取不同类的数量。
您在文本中说time2到time10是二进制的,但是您没有在ARFF文件
中说出来,因此Bagging类不知道有多少个类。

因此,您只需要在ARFF文件中指定time2是二进制的,例如:
@attribute time2 {0,1}



,您将不会再获得任何异常。



我没有研究过J48问题,因为它可能是同一个问题



测试代码:

  public static void main (String [] argv){
try {
分类器cls =(分类器)weka.core.SerializationHelper.read( bosom.100k.2.j48.MODEL);
J48 c =(J48)cls;

DataSource源= new DataSource( data.arff);
实例数据= source.getDataSet();
data.setClassIndex(6);

try {
double resultValue = c.classifyInstance(data.instance(0));
System.out.println( outcome + outcomeValue);
double [] p = c.distributionForInstance(data.instance(0));
System.out.println(Arrays.toString(p));
} catch(Exception e){
e.printStackTrace();
}
} catch(Exception e){
e.printStackTrace();
}


Overview

I am using the WEKA API 3.7.10 (developer version) to use my pre-made .model files.

I made 25 models: five outcome variables for five algorithms.

  • J48 decision tree.
  • Alternating decision tree
  • Random forest
  • LogitBoost
  • Random subspace

I am having problems with J48, Random subspace and random forest.

Necessary files

The following is the ARFF representation of my data after creation:

@relation WekaData

@attribute ageDiagNum numeric
@attribute raceGroup {Black,Other,Unknown,White}
@attribute stage3 {0,I,IIA,IIB,IIIA,IIIB,IIIC,IIINOS,IV,'UNK Stage'}
@attribute m3 {M0,M1,MX}
@attribute reasonNoCancerSurg {'Not performed, patient died prior to recommended surgery','Not recommended','Not recommended, contraindicated due to other conditions','Recommended but not performed, patient refused','Recommended but not performed, unknown reason','Recommended, unknown if performed','Surgery performed','Unknown; death certificate or autopsy only case'}
@attribute ext2 {00,05,10,11,13,14,15,16,17,18,20,21,23,24,25,26,27,28,30,31,33,34,35,36,37,38,40,50,60,70,80,85,99}
@attribute time2 {}
@attribute time4 {}
@attribute time6 {}
@attribute time8 {}
@attribute time10 {}

@data
65,White,IIA,MX,'Not recommended, contraindicated due to other conditions',14,?,?,?,?,?

I need to get the binary attributes time2 to time10 from their respective models.


Below are snippets of the code I use to get the predictions from all the model files:

private static Map<String, Object> predict(Instances instances,
        Classifier classifier, int attributeIndex) {
    Map<String, Object> map = new LinkedHashMap<String, Object>();
    int instanceIndex = 0; // do not change, equal to row 1
    double[] percentage = { 0 };
    double outcomeValue = 0;
    AbstractOutput abstractOutput = null;

    if(classifier.getClass() == RandomForest.class || classifier.getClass() == RandomSubSpace.class) {
        // has problems predicting time2 to time10
        instances.setClassIndex(5); 
    } else {
        // works as intended in LogitBoost and ADTree
        instances.setClassIndex(attributeIndex);    
    }

    try {
        outcomeValue = classifier.classifyInstance(instances.instance(0));
        percentage = classifier.distributionForInstance(instances
                .instance(instanceIndex));
    } catch (Exception e) {
        e.printStackTrace();
    }

    map.put("Class", outcomeValue);

    if (percentage.length > 0) {
        double percentageRaw = 0;
        if (outcomeValue == new Double(1)) {
            percentageRaw = percentage[1];
        } else {
            percentageRaw = 1 - percentage[0];
        }
        map.put("Percentage", percentageRaw);
    } else {
        // because J48 returns an error if percentage[i] because it's empty
        map.put("Percentage", new Double(0));
    }

    return map;
}


Here are the models I use to predict outcome for time2 hence we will use index 6:

instances.setClassIndex(5); 

Problems

  • As I said before, LogitBoost and ADTree have no problem in this straightforward method compared to the other three, as I followed the "Use WEKA in your Java code" tutorial.

  • [Solved] Based from my tweakings, RandomForest and RandomSubSpace returns an ArrayOutOfBoundsException if told to predict time2 to time10.

    java.lang.ArrayIndexOutOfBoundsException: 0
        at weka.classifiers.meta.Bagging.distributionForInstance(Bagging.java:586)
        at weka.classifiers.trees.RandomForest.distributionForInstance(RandomForest.java:602)
        at weka.classifiers.AbstractClassifier.classifyInstance(AbstractClassifier.java:70)
    

    The stack trace points the root error to the line:

    outcomeValue = classifier.classifyInstance(instances.instance(0));
    

    Solution: I had some copy-paste error during the ARFF file creation for the binary variables time2 to time10 regarding FastVector<String>()'s assignment of values to the FastVector<Attribute>() object. All ten of my RandomForest and RandomSubSpace models are working fine right now!

  • [Solved] J48 decision tree has a new problem now. Instead of not providing any predictions, it now returns an error:

    java.lang.ArrayIndexOutOfBoundsException: 11
        at weka.core.DenseInstance.value(DenseInstance.java:332)
        at weka.core.AbstractInstance.isMissing(AbstractInstance.java:315)
        at weka.classifiers.trees.j48.C45Split.whichSubset(C45Split.java:494)
        at weka.classifiers.trees.j48.ClassifierTree.getProbs(ClassifierTree.java:670)
        at weka.classifiers.trees.j48.ClassifierTree.classifyInstance(ClassifierTree.java:231)
        at weka.classifiers.trees.J48.classifyInstance(J48.java:266)
    

    and it traces to the line

    outcomeValue = classifier.classifyInstance(instances.instance(0));
    

    Solution: actually I randomly ran the program with J48 and it worked - giving the outcome variable and associated distributions.


I hope someone can help me sort out this issue. I really do not know what is wrong with this code as I have checked the Javadocs and examples online and the constant predictions are still persistent.

(I am currently checking the main program for the WEKA GUI but please help me out here :-) )

解决方案

I've only looked at the RandomForest problem for now. It is because the Bagging class extracts the number of different classes from the data instance itself, and not from the model. You say in your text that time2 to time10 are binary, but you just don't say it in your ARFF file, and so the Bagging class has no clue about how many classes there are.

So you just have to specify in your ARFF file that time2 is binary, e.g.: @attribute time2 {0,1}

and you won't get any Exception any more.

I've not looked at the J48 problem, because it may be the same issue with ARFF definition.

Test code:

  public static void main(String [] argv) {
      try {
        Classifier cls = (Classifier) weka.core.SerializationHelper.read("bosom.100k.2.j48.MODEL");
        J48 c = (J48)cls;

        DataSource source = new DataSource("data.arff");
        Instances data = source.getDataSet();
        data.setClassIndex(6);        

        try {
            double outcomeValue = c.classifyInstance(data.instance(0));
            System.out.println("outcome "+outcomeValue);
            double[] p = c.distributionForInstance(data.instance(0));
            System.out.println(Arrays.toString(p));
        } catch (Exception e) {
            e.printStackTrace();
        }
    } catch (Exception e) {
        e.printStackTrace();
    }

这篇关于给定属性索引,WEKA生成的模型似乎无法预测类和分布的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆