如何在 Weka 中使用带有新数据的创建模型 [英] How to use created model with new data in Weka

查看：38 发布时间：2021/9/24 20:10:37 java weka

本文介绍了如何在 Weka 中使用带有新数据的创建模型的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试一些关于 weka 的测试，希望有人可以帮助我，我可以让自己清楚.

I'm trying some tests on weka, hope someone can help me and i can made myself clear.

第 1 步:标记我的数据

@attribute text string
@attribute @@class@@ {derrota,empate,win}

@data
'O Grêmio perdeu para o Cruzeiro por 1 a 0',derrota
'O Grêmio venceu o Palmeiras em um grande jogo de futebol, nesta quarta-feira na Arena',vitoria

第 2 步:基于标记化数据构建模型

加载后，我应用了一个 StringToWordVector.应用此过滤器后，我保存了一个带有标记化单词的新 arff 文件.有点像..

After loading this i apply a StringToWordVector. After applying this filter i save a new arff file with the words tokenized. Something like..

@attribute @@class@@ {derrota,vitoria,win}
@attribute o numeric
@attribute grêmio numeric
@attribute perdeu numeric
@attribute venceu numeric
@ and so on .....

@data
{0 derrota, 1 1, 2 1, 3 1, 4 0, ...}
{0 vitoria, 1 1, 2 1, 3 0, 4 1, ...}

好的！现在基于这个 arff 我建立了我的分类器模型并保存它.

Ok! Now based on this arff i build my classifier model and save it.

第 3 步:使用模拟新数据"进行测试

如果我想用模拟新数据"测试我的模型，我实际上在做的是编辑最后一个 arff 并制作一条线

If i want to test my model with "simulated new data" what im doing actually is editing this last arff and making a line like

{0 ?, 1 1, 2 1, 3 1, 4 0, ...}

第 4 步(我的问题):如何使用真正的新数据进行测试

到目前为止一切顺利.我的问题是当我需要将此模型与真正的"新数据一起使用时.例如，如果我有一个带有O Grêmio caiudiante do Palmeiras"的字符串.我有 4 个新词在我的模型中不存在，2 个存在.

So far so good. My problem is when i need to use this model with 'really' new data. For example, if i have a string with "O Grêmio caiu diante do Palmeiras". I have 4 new words that doesn't exist in my model and 2 that exist.

我怎样才能用这个新数据创建一个 arff 文件，以适应我的模型?(好吧，我知道 4 个新词不会出现，但我该如何处理?)

How can i create a arff file with this new data that can be fitted in my model? (ok i know that the 4 new words will not be present, but how can i work with this?)

提供不同的测试数据后，出现以下消息

After supply a different test data the following message appears

推荐答案

如果您以编程方式使用 Weka，那么您可以很容易地做到这一点.

If you use Weka programmatically then you can do this fairly easy.

创建您的培训文件(例如 training.arff)
从训练文件创建实例.实例 trainingData = ..
使用 StringToWordVector 将您的字符串属性转换为数字表示:

Create your training file (e.g training.arff)
Create Instances from training file. Instances trainingData = ..
Use StringToWordVector to transform your string attributes to number representation:

示例代码:

    StringToWordVector() filter = new StringToWordVector(); 
    filter.setWordsToKeep(1000000);
    if(useIdf){
        filter.setIDFTransform(true);
    }
    filter.setTFTransform(true);
    filter.setLowerCaseTokens(true);
    filter.setOutputWordCounts(true);
    filter.setMinTermFreq(minTermFreq);
    filter.setNormalizeDocLength(new SelectedTag(StringToWordVector.FILTER_NORMALIZE_ALL,StringToWordVector.TAGS_FILTER));
    NGramTokenizer t = new NGramTokenizer();
    t.setNGramMaxSize(maxGrams);
    t.setNGramMinSize(minGrams);    
    filter.setTokenizer(t);  
    WordsFromFile stopwords = new WordsFromFile();
    stopwords.setStopwords(new File("data/stopwords/stopwords.txt"));
    filter.setStopwordsHandler(stopwords);
    if (useStemmer){
        Stemmer s = new /*Iterated*/LovinsStemmer();
        filter.setStemmer(s);
    }
    filter.setInputFormat(trainingData);

将过滤器应用于trainingData:trainingData = Filter.useFilter(trainingData, filter);

选择一个分类器来创建您的模型

Select a classifier to create your model

LibLinear 分类器的示例代码
```
        Classifier cls = null;
        LibLINEAR liblinear = new LibLINEAR();
        liblinear.setSVMType(new SelectedTag(0, LibLINEAR.TAGS_SVMTYPE));
        liblinear.setProbabilityEstimates(true);
        // liblinear.setBias(1); // default value
        cls = liblinear;
        cls.buildClassifier(trainingData);
```
- 保存模型

查看全文

如何在 Weka 中使用带有新数据的创建模型 [英] How to use created model with new data in Weka

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

如何在 Weka 中使用带有新数据的创建模型 [英] How to use created model with new data in Weka

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭