在java中使用朴素贝叶斯（weka）进行简单的文本分类 [英] Simple text classification using naive bayes (weka) in java

查看：625 发布时间：2019/1/2 21:01:37 java weka text-classification naivebayes arff

本文介绍了在java中使用朴素贝叶斯（weka）进行简单的文本分类的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我尝试在我的java代码中进行文本分类naive bayes weka libarary，但我认为分类的结果不正确，我不知道是什么问题。我使用arff文件作为输入。

I try to do text classification naive bayes weka libarary in my java code, but i think the result of the classification is not correct, i don't know what's the problem. I use arff file for the input.

这是我的训练数据：

@relation hamspam

@attribute text string
@attribute class {spam,ham}

@data
'good',ham
'good',ham
'very good',ham
'bad',spam
'very bad',spam
'very bad, very bad',spam
'good good bad',ham

这是我的testing_data：

this is my testing_data:

@relation test

@attribute text string
@attribute class {spam,ham}

@data
'good bad very bad',?
'good bad very bad',?
'good',?
'good very good',?
'bad',?
'very good',?
'very very good',?

这是我的代码：

public static void NaiveBayes(String training_file, String testing_file) throws FileNotFoundException, IOException, Exception{
         //filter
        StringToWordVector filter = new StringToWordVector();

        Classifier naive = new NaiveBayes();

        //training data
        Instances train = new Instances(new BufferedReader(new FileReader(training_file)));
        int lastIndex = train.numAttributes() - 1;
        train.setClassIndex(lastIndex);
        filter.setInputFormat(train);
        train = Filter.useFilter(train, filter);

        //testing data
        Instances test = new Instances(new BufferedReader(new FileReader(testing_file)));
        test.setClassIndex(lastIndex);
        filter.setInputFormat(test);
        Instances test2 = Filter.useFilter(test, filter);

        naive.buildClassifier(train);

        for(int i=0; i<test2.numInstances(); i++) {
            System.out.println(test.instance(i));
            double index = naive.classifyInstance(test2.instance(i));
            String className = train.attribute(0).value((int)index);
            System.out.println(className);
        }
    }

结果表明应该归类的数据分类为类别火腿的类垃圾邮件，以及应归类为类别垃圾邮件的类别垃圾邮件。有什么问题？请帮帮我..

The result indicate that the data that should have been classified into class spam classified into class ham, and the data that should have been classified into class ham classified into class spam. what's the problem?, help me please..

在java中使用朴素贝叶斯（weka）进行简单的文本分类 [英] Simple text classification using naive bayes (weka) in java

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

在java中使用朴素贝叶斯（weka）进行简单的文本分类 [英] Simple text classification using naive bayes (weka) in java

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭