WEKA:使用反序列化模型对实例进行分类 [英] WEKA: Classify instances with a deserialized model
问题描述
我使用了 Weka Explorer:
I used Weka Explorer:
- 加载了 arff 文件
- 应用 StringToWordVector 过滤器
- 选择 IBk 作为最佳分类器
- 生成/保存 my_model.model 二进制文件
在我的 java 代码中,我反序列化了模型:
In my java code I deserialize the model:
URL curl = ClassUtility.findClasspathResource( "models/my_model.model" );
final Classifier cls = (Classifier) weka.core.SerializationHelper.read( curl.openConnection().getInputStream() );
现在,我有了分类器,但我需要某种过滤器的信息.我得到的地方是:如何准备要由我的反序列化模型分类的实例(如何在分类之前应用过滤器)-(我必须分类的原始实例有一个带有标记的字段文本.过滤器应该将其转换为新属性列表)
Now, I have the classifier BUT I need somehow the information on the filter. Where I am getting is: how do I prepare an instance to be classified by my deserialized model (how do I apply the filter before classification) - (The raw instance that I have to classify has a field text with tokens in it. The filter was supposed to transform that into a list of new atributes)
我什至尝试使用 FilteredClassifier,我将分类器设置为反序列化,并将过滤器设置为手动创建的 StringToWordVector 实例
I even tried to use a FilteredClassifier where I set the classifier to the deserialized on and the filter to a manually created instance of StringToWordVector
final StringToWordVector filter = new StringToWordVector();
filter.setOptions(new String[]{"-C", "-P x_", "-L"});
FilteredClassifier fcls = new FilteredClassifier();
fcls.setFilter(filter);
fcls.setClassifier(cls);
以上也行不通.它抛出异常:
The above does not work either. It throws the exception:
线程main"中的异常java.lang.NullPointerException:未定义输出实例格式
Exception in thread "main" java.lang.NullPointerException: No output instance format defined
我试图避免的是在 Java 代码中进行培训.它可能非常慢,而且我可能需要训练多个分类器(以及不同的算法),并且我希望我的应用程序快速启动.
What I am trying to avoid is doing the training in the java code. It can be very slow and the prospect is that I might have multiple classifiers to train (different algorithms as well) and I want my app to start fast.
推荐答案
你的问题是你的模型不知道过滤器对数据做了什么.StringToWordVector
过滤器更改数据,但取决于输入(训练)数据.在此转换后的数据集上训练的模型仅适用于经过完全相同转换的数据.为了保证这一点,过滤器需要成为您模型的一部分.
Your problem is that your model doesn't know anything about what the filter did to the data. The StringToWordVector
filter changes the data, but depending on the input (training) data. A model trained on this transformed data set will only work on data that underwent the exact same transformation. To guarantee this, the filter needs to be part of your model.
使用 FilteredClassifier
是正确的想法,但你必须从一开始就使用它:
Using a FilteredClassifier
is the correct idea, but you have to use it from the beginning:
- 加载 ARFF 文件
- 选择
FilteredClassifier
作为分类器 - 选择
StringToWordVector
作为过滤器 - 选择
IBk
作为FilteredClassifier
的分类器 - 生成/保存模型到 my_model.binary
- Load the ARFF file
- Select
FilteredClassifier
as classifier - Select
StringToWordVector
as filter for it - Select
IBk
as classifier for theFilteredClassifier
- Generate/Save the model to my_model.binary
经过训练和序列化的模型还将包含初始化过滤器,包括有关如何转换数据的信息.
The trained and serialized model will then also contain the intialized filter, including the information on how to transform data.
这篇关于WEKA:使用反序列化模型对实例进行分类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!