我们如何使用 Lucene、Solr 或 Nutch 创建一个简单的搜索引擎? [英] How do we create a simple search engine using Lucene, Solr or Nutch?

查看:18
本文介绍了我们如何使用 Lucene、Solr 或 Nutch 创建一个简单的搜索引擎?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们公司有数以千计的 PDF 文档.我们如何使用 Lucene、Solr 或 Nutch 创建一个简单的搜索引擎?我们将提供一个基本的 Java/JSP 网页,人们可以输入单词并执行基本和/或查询,然后向他们显示所有匹配 PDF 的文档链接.

Our company has thousands of PDF documents. How do we create a simple search engine using Lucene, Solr or Nutch? We'll provide a basic Java/JSP web page were people can type in words and perform basic and/or queries then show them the document links of all matching PDF's.

推荐答案

Lucene 系列中的所有项目都不能原生处理 PDF,但是您可以使用一些实用程序以及编写自己的编写好的示例.

None of the projects in the Lucene family can natively process PDFs, but there are utilities you can drop in and well written examples on how to roll your own.

Lucene 几乎可以完成您需要做的任何事情,但正如 Tony 上面所说的,您的时间会产生开销.数以千计的文档确实并不多,因此您可以选择更轻量级的替代方案.

Lucene will do pretty much whatever you need it to do, but there is overhead in terms of your time, as Tony said above. Thousands of documents really isn't that many, so you might be able to get away with a lighter weight alternative.

也就是说,我仍然建议您查看 Solr - 它比 Lucene 更容易设置,支持备份、复制等,以及非常适合您的用例的漂亮 JSON 接口: http://wiki.apache.org/solr/SolJSON

That said, I would still recommend looking at Solr - it's much, much easier to set up than Lucene, has support for backups, replication, etc., as well as a nifty JSON interface which would fit your use case very well: http://wiki.apache.org/solr/SolJSON

这篇关于我们如何使用 Lucene、Solr 或 Nutch 创建一个简单的搜索引擎?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆