尝试使用Weka向训练集添加更多实例时IndexOutOfBoundsException [英] IndexOutOfBoundsException when trying to add more instances to training set using Weka

查看：155 发布时间：2018/12/24 12:25:15 java weka

本文介绍了尝试使用Weka向训练集添加更多实例时IndexOutOfBoundsException的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试向我的训练集添加更多实例并执行10次交叉验证。

I am trying to add more Instances to my training set and perform 10-fold cross validation.

我的实例是String格式，所以我使用StringToWordVector过滤器来将它们转换为数字。如果我不添加我想要的额外页面，事情会很好。但是当我添加命令 trainSet.addAll（data2）; 并将 trainSet 传递给过滤器时我得到一个奇怪的 IndexOutOfBoundsException 在第一次迭代实例fTrainSet = Filter.useFilter（trainSet，filter）;

My instances are in String format so i use the StringToWordVector filter to transform them to numbers. Things work well if i do not add the extra pages i want. But when i add the command trainSet.addAll(data2); and pass trainSet to the filter i get a strange IndexOutOfBoundsException in the first iteration at Instances fTrainSet = Filter.useFilter(trainSet, filter);

Instances data = getDataFromFile("pathtofile.arff");//main dataset 1821 instances
Instances data2 = getDataFromFile("anotherpath.arff");//709 instances i want to add 
int folds = 10;
for(int i=0;i<folds;i++){
    Instances trainSet = data.trainCV(folds, i);//training set
    System.out.println(trainSet.numInstances());//Prints 1638
    Instances testSet =  data.testCV(folds, i);//testing set

    //add more instances
    trainSet.addAll(data2);        
    System.out.println(trainSet.numInstances());//Prints 2347

    //filter
    StringToWordVector filter = new StringToWordVector();
    filter.setInputFormat(trainSet);        
    filter.setWordsToKeep(10000);
    filter.setTFTransform(true);
    filter.setLowerCaseTokens(true);
    filter.setOutputWordCounts(true);
    Stemmer stemmer = new IteratedLovinsStemmer();
    filter.setStemmer(stemmer);
    WordsFromFile stopwords = new WordsFromFile();
    stopwords.setStopwords(new File(".data/stopwords2.txt"));
    filter.setStopwordsHandler(stopwords);

    Instances fTrainSet = Filter.useFilter(trainSet, filter);//error!!!
    Instances fTestSet = Filter.useFilter(testSet, filter);
    ....
    //classification and evaluation....

当我尝试使用过滤器时出现以下错误：

I get the following error when i am trying to use the filter:

Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 2161, Size: 1749
    at java.util.ArrayList.rangeCheck(Unknown Source)
    at java.util.ArrayList.get(Unknown Source)
    at weka.core.Attribute.addStringValue(Attribute.java:924)
    at weka.core.StringLocator.copyStringValues(StringLocator.java:150)
    at weka.core.StringLocator.copyStringValues(StringLocator.java:91)
    at weka.filters.Filter.copyValues(Filter.java:399)
    at weka.filters.Filter.bufferInput(Filter.java:342)
    at weka.filters.unsupervised.attribute.StringToWordVector.input(StringToWordVector.java:655)
    at weka.filters.Filter.useFilter(Filter.java:692)
    at CrossValidationExample.main(CrossValidationExample.java:108)

什么可以是错的？

推荐答案

经过一番搜索，我发现addAll 功能。我能想到的一个原因是 addAll 只是添加了实例的引用，当我尝试将它们与过滤器。相反，我使用了此处提出的合并功能 https://stackoverflow.com/a/12359788/3923800 ，所以我用替换了 trainSet.addAll（data2）; 实例newTrainSettrainSet = merge（trainSet，data2）; 和所有内容工作正常。


After some searching i realize that there is something wrong with the addAll function. One reason i can think of is that addAll just adds references of instances and that is an issue when i try to use them with the filter .
Instead, i used the merge function proposed here https://stackoverflow.com/a/12359788/3923800 ,so i replaced trainSet.addAll(data2);  with  Instances newTrainSettrainSet =  merge(trainSet,data2); and everything works fine.

                        这篇关于尝试使用Weka向训练集添加更多实例时IndexOutOfBoundsException的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

尝试使用Weka向训练集添加更多实例时IndexOutOfBoundsException [英] IndexOutOfBoundsException when trying to add more instances to training set using Weka

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

尝试使用Weka向训练集添加更多实例时IndexOutOfBoundsException [英] IndexOutOfBoundsException when trying to add more instances to training set using Weka

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭