使用weka jar在java代码中加载朴素贝叶斯模型 [英] Load Naïve Bayes model in java code using weka jar
问题描述
我使用了 weka 并通过使用 weka GUI 制作了一个朴素贝叶斯分类器.然后我按照这个教程保存了这个模型.现在我想通过 Java 代码加载这个模型,但我找不到任何方法来使用 weka 加载保存的模型.
I have used weka and made a Naive Bayes classifier, by using weka GUI. Then I have saved this model by following this tutorial. Now I want to load this model through Java code but I am unable to find any way to load a saved model using weka.
这是我的要求,我必须单独制作模型,然后在单独的程序中使用它.
This is my requirement that I have to made model separately and then use it in a separate program.
如果有人能在这方面指导我,我将不胜感激.
If anyone can guide me in this regard I will be thankful to you.
推荐答案
您可以使用以下命令轻松加载 Java 中保存的模型:
You can easily load a saved model in java using this command:
Classifier myCls = (Classifier) weka.core.SerializationHelper.read(pathToModel);
对于 Java 中的完整工作流程,我在 SO 文档中写了以下文章,现在复制到此处:
For a complete workflow in Java I wrote the following article in SO Documentation, now copied here:
从 .arff 文件创建训练实例
Create training instances from .arff file
private static Instances getDataFromFile(String path) throws Exception{
DataSource source = new DataSource(path);
Instances data = source.getDataSet();
if (data.classIndex() == -1){
data.setClassIndex(data.numAttributes()-1);
//last attribute as class index
}
return data;
}
Instances trainingData = getDataFromFile(pathToArffFile);
使用 StringToWordVector 将您的字符串属性转换为数字表示:
Use StringToWordVector to transform your string attributes to number representation:
此过滤器的重要功能:
Important features of this filter:
- tf-idf 表示
- 提取
- 小写单词
- 停用词
- n-gram 表示*
StringToWordVector() filter = new StringToWordVector(); filter.setWordsToKeep(1000000); if(useIdf){ filter.setIDFTransform(true); } filter.setTFTransform(true); filter.setLowerCaseTokens(true); filter.setOutputWordCounts(true); filter.setMinTermFreq(minTermFreq); filter.setNormalizeDocLength(new SelectedTag(StringToWordVector.FILTER_NORMALIZE_ALL,StringToWordVector.TAGS_FILTER)); NGramTokenizer t = new NGramTokenizer(); t.setNGramMaxSize(maxGrams); t.setNGramMinSize(minGrams); filter.setTokenizer(t); WordsFromFile stopwords = new WordsFromFile(); stopwords.setStopwords(new File("data/stopwords/stopwords.txt")); filter.setStopwordsHandler(stopwords); if (useStemmer){ Stemmer s = new /*Iterated*/LovinsStemmer(); filter.setStemmer(s); } filter.setInputFormat(trainingData);
将过滤器应用于trainingData:
trainingData = Filter.useFilter(trainingData, filter);
创建 LibLinear 分类器
Create the LibLinear Classifier
- 下面的 SVMType 0 对应于 L2 正则化逻辑回归
设置
setProbabilityEstimates(true)
打印输出概率
分类器 cls = null;LibLINEAR liblinear = 新的 LibLINEAR();liblinear.setSVMType(new SelectedTag(0, LibLINEAR.TAGS_SVMTYPE));liblinear.setProbabilityEstimates(true);//liblinear.setBias(1);//默认值cls = liblinear;cls.buildClassifier(trainingData);
保存模型
Save model
System.out.println("保存模型...");ObjectOutputStream oos;oos = new ObjectOutputStream(new FileOutputStream(path+"mymodel.model"));oos.writeObject(cls);oos.flush();oos.close();
从
.arff
文件创建测试实例Create testing instances from
.arff
file实例 trainingData = getDataFromFile(pathToArffFile);
Instances trainingData = getDataFromFile(pathToArffFile);
负载分类器
Classifier myCls = (Classifier) weka.core.SerializationHelper.read(path+"mymodel.model");
使用与上述相同的 StringToWordVector 过滤器或为 testingData 创建一个新过滤器,但请记住在此命令中使用 trainingData:
filter.setInputFormat(trainingData);
strong> 这将使训练和测试实例兼容.或者你可以使用InputMappedClassifier
Use the same StringToWordVector filter as above or create a new one for testingData, but remember to use the trainingData for this command:
filter.setInputFormat(trainingData);
This will make training and testing instances compatible. Alternatively you could useInputMappedClassifier
将过滤器应用到 testingData:
testingData = Filter.useFilter(testingData, filter);
Apply the filter to testingData:
testingData = Filter.useFilter(testingData, filter);
分类!
1.获取测试集中每个实例的类值
1.Get the class value for every instance in the testing set
for (int j = 0; j
res
是一个双精度值,对应于.arff
文件中定义的名义类.要获得名义类使用:testintData.classAttribute().value((int)res)
for (int j = 0; j < testingData.numInstances(); j++) { double res = myCls.classifyInstance(testingData.get(j)); }
res
is a double value that corresponds to the nominal class that is defined in.arff
file. To get the nominal class use :testintData.classAttribute().value((int)res)
2.获取每个实例的概率分布
2.Get the probability distribution for every instance
for (int j = 0; j < testingData.numInstances(); j++) { double[] dist = first.distributionForInstance(testInstances.get(j)); }
dist
是一个双数组,它包含.arff
文件中定义的每个类的概率dist
is a double array that contains the probabilities for every class defined in.arff
file注意.分类器应支持概率分布并通过以下方式启用它们:
myClassifier.setProbabilityEstimates(true);
这篇关于使用weka jar在java代码中加载朴素贝叶斯模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!