WEKA:使用反序列化模型对实例进行分类 [英] WEKA: Classify instances with a deserialized model

查看:37
本文介绍了WEKA:使用反序列化模型对实例进行分类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用了 Weka Explorer:

I used Weka Explorer:

  • 加载了 arff 文件
  • 应用 StringToWordVector 过滤器
  • 选择 IBk 作为最佳分类器
  • 生成/保存 my_model.model 二进制文件

在我的 java 代码中,我反序列化了模型:

In my java code I deserialize the model:

    URL curl = ClassUtility.findClasspathResource( "models/my_model.model" );
    final Classifier cls = (Classifier) weka.core.SerializationHelper.read( curl.openConnection().getInputStream() );

现在,我有了分类器,但我需要某种过滤器的信息.我得到的地方是:如何准备要由我的反序列化模型分类的实例(如何在分类之前应用过滤器)-(我必须分类的原始实例有一个带有标记的字段文本.过滤器应该将其转换为新属性列表)

Now, I have the classifier BUT I need somehow the information on the filter. Where I am getting is: how do I prepare an instance to be classified by my deserialized model (how do I apply the filter before classification) - (The raw instance that I have to classify has a field text with tokens in it. The filter was supposed to transform that into a list of new atributes)

我什至尝试使用 FilteredClassifier,我将分类器设置为反序列化,并将过滤器设置为手动创建的 StringToWordVector 实例

I even tried to use a FilteredClassifier where I set the classifier to the deserialized on and the filter to a manually created instance of StringToWordVector

    final StringToWordVector filter = new StringToWordVector();
    filter.setOptions(new String[]{"-C", "-P x_", "-L"});
    FilteredClassifier fcls = new FilteredClassifier();
    fcls.setFilter(filter);
    fcls.setClassifier(cls);

以上也行不通.它抛出异常:

The above does not work either. It throws the exception:

线程main"中的异常java.lang.NullPointerException:未定义输出实例格式

Exception in thread "main" java.lang.NullPointerException: No output instance format defined

我试图避免的是在 Java 代码中进行培训.它可能非常慢,而且我可能需要训练多个分类器(以及不同的算法),并且我希望我的应用程序快速启动.

What I am trying to avoid is doing the training in the java code. It can be very slow and the prospect is that I might have multiple classifiers to train (different algorithms as well) and I want my app to start fast.

推荐答案

你的问题是你的模型不知道过滤器对数据做了什么.StringToWordVector 过滤器更改数据,但取决于输入(训练)数据.在此转换后的数据集上训练的模型仅适用于经过完全相同转换的数据.为了保证这一点,过滤器需要成为您模型的一部分.

Your problem is that your model doesn't know anything about what the filter did to the data. The StringToWordVector filter changes the data, but depending on the input (training) data. A model trained on this transformed data set will only work on data that underwent the exact same transformation. To guarantee this, the filter needs to be part of your model.

使用 FilteredClassifier 是正确的想法,但你必须从一开始就使用它:

Using a FilteredClassifier is the correct idea, but you have to use it from the beginning:

  • 加载 ARFF 文件
  • 选择FilteredClassifier作为分类器
  • 选择 StringToWordVector 作为过滤器
  • 选择 IBk 作为 FilteredClassifier
  • 的分类器
  • 生成/保存模型到 my_model.binary
  • Load the ARFF file
  • Select FilteredClassifier as classifier
  • Select StringToWordVector as filter for it
  • Select IBk as classifier for the FilteredClassifier
  • Generate/Save the model to my_model.binary

经过训练和序列化的模型还将包含初始化过滤器,包括有关如何转换数据的信息.

The trained and serialized model will then also contain the intialized filter, including the information on how to transform data.

这篇关于WEKA:使用反序列化模型对实例进行分类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆