使用weka jar在java代码中加载朴素贝叶斯模型 [英] Load Naïve Bayes model in java code using weka jar

查看:104
本文介绍了使用weka jar在java代码中加载朴素贝叶斯模型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用了 weka 并通过使用 weka GUI 制作了一个朴素贝叶斯分类器.然后我按照这个教程保存了这个模型.现在我想通过 Java 代码加载这个模型,但我找不到任何方法来使用 weka 加载保存的模型.

I have used weka and made a Naive Bayes classifier, by using weka GUI. Then I have saved this model by following this tutorial. Now I want to load this model through Java code but I am unable to find any way to load a saved model using weka.

这是我的要求,我必须单独制作模型,然后在单独的程序中使用它.

This is my requirement that I have to made model separately and then use it in a separate program.

如果有人能在这方面指导我,我将不胜感激.

If anyone can guide me in this regard I will be thankful to you.

推荐答案

您可以使用以下命令轻松加载 Java 中保存的模型:

You can easily load a saved model in java using this command:

Classifier myCls = (Classifier) weka.core.SerializationHelper.read(pathToModel);

对于 Java 中的完整工作流程,我在 SO 文档中写了以下文章,现在复制到此处:

For a complete workflow in Java I wrote the following article in SO Documentation, now copied here:

  • 从 .arff 文件创建训练实例

  • Create training instances from .arff file

private static Instances getDataFromFile(String path) throws Exception{

    DataSource source = new DataSource(path);
    Instances data = source.getDataSet();

    if (data.classIndex() == -1){
        data.setClassIndex(data.numAttributes()-1);
        //last attribute as class index
    }

    return data;    
}

Instances trainingData = getDataFromFile(pathToArffFile);

  • 使用 StringToWordVector 将您的字符串属性转换为数字表示:

    • Use StringToWordVector to transform your string attributes to number representation:

      • 此过滤器的重要功能:

      • Important features of this filter:

      1. tf-idf 表示
      2. 提取
      3. 小写单词
      4. 停用词
      5. n-gram 表示*

    •  

      StringToWordVector() filter = new StringToWordVector();    
      filter.setWordsToKeep(1000000);
      if(useIdf){
          filter.setIDFTransform(true);
      }
      filter.setTFTransform(true);
      filter.setLowerCaseTokens(true);
      filter.setOutputWordCounts(true);
      filter.setMinTermFreq(minTermFreq);
      filter.setNormalizeDocLength(new SelectedTag(StringToWordVector.FILTER_NORMALIZE_ALL,StringToWordVector.TAGS_FILTER));
      NGramTokenizer t = new NGramTokenizer();
      t.setNGramMaxSize(maxGrams);
      t.setNGramMinSize(minGrams);    
      filter.setTokenizer(t);     
      WordsFromFile stopwords = new WordsFromFile();
      stopwords.setStopwords(new File("data/stopwords/stopwords.txt"));
      filter.setStopwordsHandler(stopwords);
      if (useStemmer){
          Stemmer s = new /*Iterated*/LovinsStemmer();
          filter.setStemmer(s);
      }
      filter.setInputFormat(trainingData);
      

      • 将过滤器应用于trainingData:trainingData = Filter.useFilter(trainingData, filter);

        创建 LibLinear 分类器

        Create the LibLinear Classifier

        1. 下面的 SVMType 0 对应于 L2 正则化逻辑回归
        2. 设置 setProbabilityEstimates(true) 打印输出概率

        分类器 cls = null;LibLINEAR liblinear = 新的 LibLINEAR();liblinear.setSVMType(new SelectedTag(0, LibLINEAR.TAGS_SVMTYPE));liblinear.setProbabilityEstimates(true);//liblinear.setBias(1);//默认值cls = liblinear;cls.buildClassifier(trainingData);

      • 保存模型

      • Save model

        System.out.println("保存模型...");ObjectOutputStream oos;oos = new ObjectOutputStream(new FileOutputStream(path+"mymodel.model"));oos.writeObject(cls);oos.flush();oos.close();

        .arff 文件创建测试实例

        Create testing instances from .arff file

        实例 trainingData = getDataFromFile(pathToArffFile);

        Instances trainingData = getDataFromFile(pathToArffFile);

        负载分类器

        Classifier myCls = (Classifier) weka.core.SerializationHelper.read(path+"mymodel.model");

        • 使用与上述相同的 StringToWordVector 过滤器或为 testingData 创建一个新过滤器,但请记住在此命令中使用 trainingData:filter.setInputFormat(trainingData);strong> 这将使训练和测试实例兼容.或者你可以使用 InputMappedClassifier

        • Use the same StringToWordVector filter as above or create a new one for testingData, but remember to use the trainingData for this command:filter.setInputFormat(trainingData); This will make training and testing instances compatible. Alternatively you could use InputMappedClassifier

        将过滤器应用到 testingData:testingData = Filter.useFilter(testingData, filter);

        Apply the filter to testingData: testingData = Filter.useFilter(testingData, filter);

        分类!

        1.获取测试集中每个实例的类值

        1.Get the class value for every instance in the testing set

        for (int j = 0; j res 是一个双精度值,对应于 .arff 文件中定义的名义类.要获得名义类使用:testintData.classAttribute().value((int)res)

        for (int j = 0; j < testingData.numInstances(); j++) { double res = myCls.classifyInstance(testingData.get(j)); } res is a double value that corresponds to the nominal class that is defined in .arff file. To get the nominal class use : testintData.classAttribute().value((int)res)

        2.获取每个实例的概率分布

        2.Get the probability distribution for every instance

         for (int j = 0; j < testingData.numInstances(); j++) {
            double[] dist = first.distributionForInstance(testInstances.get(j));
         }
        

        dist 是一个双数组,它包含 .arff 文件中定义的每个类的概率

        dist is a double array that contains the probabilities for every class defined in .arff file

        注意.分类器应支持概率分布并通过以下方式启用它们:myClassifier.setProbabilityEstimates(true);

        这篇关于使用weka jar在java代码中加载朴素贝叶斯模型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆