如何对用lucene索引的文档进行分类 [英] How to classify documents indexed with lucene

查看:192
本文介绍了如何对用lucene索引的文档进行分类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已将一组文档与Lucene分类(字段:内容,类别)。每个文档都有自己的类别,但其中一些标记为未分类。有没有办法在java中轻松地对这些文档进行分类?

I have classified a set of documents with Lucene (fields: content, category). Each document has it's own category, but some of them are labeled as uncategorized. Is there any way to classify these documents easily in java?

推荐答案

从Lucene 5.2.1开始,你可以使用索引文档以对新文档进行分类。开箱即用,Lucene提供了一个朴素的贝叶斯分类器,一个k-最近邻分类器(基于 MoreLikeThis 类和基于Perceptron的分类器。

As of Lucene 5.2.1, you can use indexed documents to classify new documents. Out of the box, Lucene offers a naive Bayes classifier, a k-Nearest Neighbor classifier (based on the MoreLikeThis class) and a Perceptron based classifier.

缺点是所有这些类都是标有实验警告,并记录了维基百科的链接。

The drawback is that all of these classes are marked with experimental warnings and documented with links to Wikipedia.

这篇关于如何对用lucene索引的文档进行分类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆