weka中的测试文件是否需要与train相同或更少的功能? [英] Does test file in weka requires same or less number of features as train?

查看:92
本文介绍了weka中的测试文件是否需要与train相同或更少的功能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经从两个不同的数据集中准备了两个不同的.arff文件,一个用于测试,另一个用于训练.它们每个都有相同的实例,但是不同的特征会更改每个文件的特征向量的维数.当我对这些文件中的每个文件进行交叉验证时,它们运行良好.这表明.arff文件已正确准备,没有任何错误.

I have prepared two different .arff files from two different datasets one for testing and other for training. Each of them have equal instances but different features changing the dimensionality of feature vector for each file. When i did cross-validation on each of these files, they are working perfectly. This shows .arff files are properly prepared and don't have any error.

现在,如果我使用与测试文件相比尺寸较小的训练文件进行评估.我收到以下错误.

Now if i use the train file having less dimensionality compared to test file for evaluation. I get a following error.

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 5986
at  weka.classifiers.bayes.NaiveBayesMultinomial.probOfDocGivenClass(NaiveBayesMultinomial.java:295)
at weka.classifiers.bayes.NaiveBayesMultinomial.distributionForInstance(NaiveBayesMultinomial.java:254)
at weka.classifiers.Evaluation.evaluationForSingleInstance(Evaluation.java:1657)
at weka.classifiers.Evaluation.evaluateModelOnceAndRecordPrediction(Evaluation.java:1694)
at weka.classifiers.Evaluation.evaluateModel(Evaluation.java:1574)
at TrainCrossValidateARFF.main(TrainCrossValidateARFF.java:44)

weka中的测试文件是否需要与train相同或更少的功能? 评估代码

Does test file in weka requires same or less number of features as train ? Code for evaluation

public class TrainCrossValidateARFF{
    private static DecimalFormat df = new DecimalFormat("#.##");
    public static void main(String args[]) throws Exception
    {
            if (args.length != 1 && args.length != 2) {
                    System.out.println("USAGE: CrossValidateARFF <arff_file> [<stop_words_file>]");
                    System.exit(-1);
            }
            String TrainarffFilePath = args[0];
            DataSource ds = new DataSource(TrainarffFilePath);
            Instances Train = ds.getDataSet();
            Train.setClassIndex(Train.numAttributes() - 1);

            String TestarffFilePath = args[1];
            DataSource ds1 = new DataSource(TestarffFilePath);
            Instances Test  = ds1.getDataSet();
            // setting class attribute
            Test.setClassIndex(Test.numAttributes() - 1);

            System.out.println("-----------"+TrainarffFilePath+"--------------");
            System.out.println("-----------"+TestarffFilePath+"--------------");
            NaiveBayesMultinomial naiveBayes = new NaiveBayesMultinomial();
            naiveBayes.buildClassifier(Train);

            Evaluation eval = new Evaluation(Train);
            eval.evaluateModel(naiveBayes,Test);
            System.out.println(eval.toSummaryString("\nResults\n======\n", false));
}
}

推荐答案

weka中的测试文件是否需要与train相同或更少的功能?评估代码

Does test file in weka requires same or less number of features as train ? Code for evaluation

相同数量的功能是必需的.您可能需要插入?也是class属性.

Same number of features are necessary. You may need to insert ? for class attribute too.

根据 Weka建筑师Mark Hall

为了兼容,两组实例的标头信息必须相同-相同 属性的数量,具有相同名称的相同顺序.此外,任何名义属性都必须 在两组实例中具有以相同顺序声明的相同值. 对于测试集中未知的类值,只需将每个值设置为丢失-即?".

To be compatible, the header information of the two sets of instances needs to be the same - same number of attributes, with the same names in the same order. Furthermore, any nominal attributes must have the same values declared in the same order in both sets of instances. For unknown class values in your test set just set the value of each to missing - i.e "?".

这篇关于weka中的测试文件是否需要与train相同或更少的功能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆