识别重要文件 [英] Identification of the important document

查看:84
本文介绍了识别重要文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Java中有一组文本文档.我必须使用计算机来识别最重要的文档(就像专家会识别的一样).

例如.我有10本关于Java的书,系统将Java完整参考文献标识为最重要的文档或最相关的文档.(基于与Java维基百科页面的相似性)

一种方法是拥有参考文档,并找到该文档与手头的文档集之间的相似之处(如前面的示例中所述).并提供结果说,具有最大相似性的是最重要的文档.

我想确定其他更有效的方法来执行此操作.请建议其他查找相关文档的方法(如果可能,以无监督的方式).

I have a set of text documents in java . I have to identify the most important document (just as what an expert would identify) using a computer.

eg. I have 10 books on java , the system identifies Java complete reference as the most important document or the most relevant.(based on similarities with the wikipedia page about java)

One method would be to have a reference document and find similarities between this document and the set of documents at hand (as mentioned in the previous example). And provide a result saying the one which has maximum similarity is the most important docuemnt .

I want to identify other more efficient methods of performing this. please suggest other methods for finding the relevant document (in a unsupervised way if possible) .

推荐答案

您正在谈论排名的全文搜索,请尝试查看lucene的全文.文字搜索引擎:
http://incubator.apache.org/lucene.net/ [
You are talking about ranked full text search, try looking at lucene the full text search engine:
http://incubator.apache.org/lucene.net/[^]


这篇关于识别重要文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆