选择独立的全文搜索服务器:Sphinx 还是 SOLR? [英] Choosing a stand-alone full-text search server: Sphinx or SOLR?

查看:26
本文介绍了选择独立的全文搜索服务器:Sphinx 还是 SOLR?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找具有以下属性的独立全文搜索服务器:

I'm looking for a stand-alone full-text search server with the following properties:

  • 必须作为独立服务器运行,可以为来自多个客户端的搜索请求提供服务
  • 必须能够通过索引 SQL 查询的结果来进行批量索引":比如SELECT id, text_to_index FROM documents;"
  • 必须是免费软件,并且必须在使用 MySQL 作为数据库的 Linux 上运行
  • 必须快(排除 MySQL 的内部全文搜索)

我发现具有这些属性的替代方案是:

The alternatives I've found that have these properties are:

  • Solr(基于 Lucene)
  • ElasticSearch(同样基于 Lucene)
  • 狮身人面像

我的问题:

  • 他们如何比较?
  • 我是否错过了任何替代方案?
  • 我知道每个用例都不同,但在某些情况下我肯定不想想要使用某个包吗?
  • How do they compare?
  • Have I missed any alternatives?
  • I know that each use case is different, but are there certain cases where I would definitely not want to use a certain package?

推荐答案

我已经成功使用 Solr 快 2 年了,从来没有使用过 Sphinx,所以我显然有偏见.但是,我会尝试通过引用文档或其他人来保持其客观性.我也会给我的答案打补丁:-)

I've been using Solr successfully for almost 2 years now, and have never used Sphinx, so I'm obviously biased. However, I'll try to keep it objective by quoting the docs or other people. I'll also take patches to my answer :-)

相似之处:

  • Solr 和 Sphinx 都能满足您的所有要求.它们速度很快,旨在高效地索引和搜索大量数据.
  • 两者都有一长串使用它们的高流量站点(Solr狮身人面像)
  • 两者都提供商业支持.(Solr狮身人面像)
  • 两者都为多种平台/语言提供客户端 API 绑定(SphinxSolr)
  • 两者都可以分发以提高速度和容量(SphinxSolr)
  • Both Solr and Sphinx satisfy all of your requirements. They're fast and designed to index and search large bodies of data efficiently.
  • Both have a long list of high-traffic sites using them (Solr, Sphinx)
  • Both offer commercial support. (Solr, Sphinx)
  • Both offer client API bindings for several platforms/languages (Sphinx, Solr)
  • Both can be distributed to increase speed and capacity (Sphinx, Solr)

以下是一些差异:

  • Solr, being an Apache project, is obviously Apache2-licensed. Sphinx is GPLv2. This means that if you ever need to embed or extend (not just "use") Sphinx in a commercial application, you'll have to buy a commercial license (rationale)
  • Solr is easily embeddable in Java applications.
  • Solr is built on top of Lucene, which is a proven technology over 8 years old with a huge user base (this is only a small part). Whenever Lucene gets a new feature or speedup, Solr gets it too. Many of the devs committing to Solr are also Lucene committers.
  • Sphinx integrates more tightly with RDBMSs, especially MySQL.
  • Solr can be integrated with Hadoop to build distributed applications
  • Solr can be integrated with Nutch to quickly build a fully-fledged web search engine with crawler.
  • Solr can index proprietary formats like Microsoft Word, PDF, etc. Sphinx can't.
  • Solr comes with a spell-checker out of the box.
  • Solr comes with facet support out of the box. Faceting in Sphinx takes more work.
  • Sphinx doesn't allow partial index updates for field data.
  • In Sphinx, all document ids must be unique unsigned non-zero integer numbers. Solr doesn't even require an unique key for many operations, and unique keys can be either integers or strings.
  • Solr supports field collapsing (currently as an additional patch only) to avoid duplicating similar results. Sphinx doesn't seem to provide any feature like this.
  • While Sphinx is designed to only retrieve document ids, in Solr you can directly get whole documents with pretty much any kind of data, making it more independent of any external data store and it saves the extra roundtrip.
  • Solr, except when used embedded, runs in a Java web container such as Tomcat or Jetty, which require additional specific configuration and tuning (or you can use the included Jetty and just launch it with java -jar start.jar). Sphinx has no additional configuration.

相关问题:

这篇关于选择独立的全文搜索服务器:Sphinx 还是 SOLR?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆