如何对用lucene索引的文档进行分类 [英] How to classify documents indexed with lucene
问题描述
我已将一组文档与Lucene分类(字段:内容,类别)。每个文档都有自己的类别,但其中一些标记为未分类。有没有办法在java中轻松地对这些文档进行分类?
I have classified a set of documents with Lucene (fields: content, category). Each document has it's own category, but some of them are labeled as uncategorized. Is there any way to classify these documents easily in java?
推荐答案
从Lucene 5.2.1开始,你可以使用索引文档以对新文档进行分类。开箱即用,Lucene提供了一个朴素的贝叶斯分类器,一个k-最近邻分类器(基于 MoreLikeThis 类和基于Perceptron的分类器。
As of Lucene 5.2.1, you can use indexed documents to classify new documents. Out of the box, Lucene offers a naive Bayes classifier, a k-Nearest Neighbor classifier (based on the MoreLikeThis class) and a Perceptron based classifier.
缺点是所有这些类都是标有实验警告,并记录了维基百科的链接。
The drawback is that all of these classes are marked with experimental warnings and documented with links to Wikipedia.
这篇关于如何对用lucene索引的文档进行分类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!