weka stringToWordVector过滤stringOptions [英] weka stringToWordVector filter stringOptions

查看:180
本文介绍了weka stringToWordVector过滤stringOptions的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图用weka的java API过滤数据集。我已经用Weka的GUI成功地过滤了我想用stringToWordVector过滤器的属性,但是我似乎无法在我的java代码中做同样的事情。我复制并粘贴了自动生成的过滤参数,并将它们发布到我的代码中,但仍然继续出现错误。目前,我的代码如下所示:

 实例newInsts = new Instances(this.instances); 
StringToWordVector stringFilter = new StringToWordVector();
stringFilter.setOptions(
weka.core.Utils.splitOptions( - R 1,2,3,4,8 -W 1000
-prune-rate -1.0 -N-0 -stemmer
weka.core.stemmers.NullStemmer -M 1
-tokenizer \weka.core.tokenizers.WordTokenizer
-delimiters \\\r\\\ \\\t;:\\\'\\\ ()?\));
stringFilter.setInputFormat(newInsts);
newInsts = Filter.useFilter(newInsts,stringFilter);

但是我一直在我的eclipse控制台中发现这个错误:没有赋予-delimiters选项的值。

(我在上面的代码中增加了额外的可读性空间,我猜这和转义字符/引号有关系。)



谢谢!

解决方案

实际上可以省略大部分选项,因为它们是StringToWordVector的默认值。您要传递的分隔符是默认分词器中的默认分隔符, WordTokenizer ,它们是:

 '\r\\\
\t。,;:' ()?!'


I'm trying to filter a dataset using weka's java API. I've successfully filtered the attributes I want with a stringToWordVector filter in Weka's GUI but I can't seem to do the same in my java code. I copied and pasted the auto-generated filtering parameters and posted them into my code but am continuing to get errors. Currently, my code looks like this:

Instances newInsts = new Instances(this.instances);
StringToWordVector stringFilter = new StringToWordVector();
stringFilter.setOptions(
            weka.core.Utils.splitOptions("-R 1,2,3,4,8 -W 1000 
                                          -prune-rate -1.0 -N 0 -stemmer
                                           weka.core.stemmers.NullStemmer -M 1
                                          -tokenizer \"weka.core.tokenizers.WordTokenizer 
                                          -delimiters \" \\r\\n\\t.,;:\\\'\\\"()?!\""));
stringFilter.setInputFormat(newInsts);
newInsts = Filter.useFilter(newInsts, stringFilter);

But I keep getting this error in my eclipse console: No value given for -delimiters option.

(I added extra spacing for readability in the above code. I suspect this has something to do with escaping characters/quotations marks...)

Thanks!

解决方案

You can actually omit most of the options, as they are the defaults for StringToWordVector. The delimiters you're trying to pass are the default delimiters in the default tokenizer, WordTokenizer, which are:

' \r\n\t.,;:'"()?!'

这篇关于weka stringToWordVector过滤stringOptions的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆