选择独立的全文搜索服务器:Sphinx还是SOLR? [英] Choosing a stand-alone full-text search server: Sphinx or SOLR?
问题描述
我正在寻找具有以下属性的独立全文搜索服务器:
- 必须作为独立服务器运行,可以提供来自多个客户端的搜索请求
- 必须能够通过索引SQL查询的结果来执行批量索引:说SELECT id,text_to_index FROM documents;
- 必须是免费软件,并且必须在Linux上以MySQL作为数据库运行
- 必须快速(排除MySQL内部全文搜索) li>
我发现有这些属性的替代品是:
我的问题:
但是,我会尽量通过引用文档或其他人来保持客观。我也会给我的答案补丁:)
相似之处:
a>, Solr )
以下是一些区别:
- Solr是Apache项目,显然是Apache2许可的。 狮身人面像是GPLv2 。这意味着如果您需要在商业应用程序中嵌入或扩展(不仅仅是使用)Sphinx,您必须购买商业许可证( rationale )
- Solr 易于嵌入到Java应用程序中。
- Solr构建于Java Lucene,这是一个成熟的技术,通过 8岁与< a href =http://wiki.apache.org/lucene-java/PoweredBy =noreferrer> 巨大的用户群(这只是一小部分)。每当Lucene获得新功能或加速时,Solr也会获得它。许多向Solr提交的开发人员也都是Lucene提交者。
- Sphinx与RDBMS更加紧密地集成,尤其是MySQL。 Solr可以与Hadoop集成以构建分布式应用程序 >
- Solr可以与Nutch集成,以快速构建完整的网站搜索引擎与爬虫。
- Solr可以索引专有格式如Microsoft Word,PDF等等。狮身人面像不能。
- Solr自带拼写检查程序。
- Solr自带方框支持。在狮身人面像中面对需要更多工作。
- 狮身人面像不允许部分索引更新字段数据。
- 在Sphinx中,所有文档ID必须是唯一的无符号非零整数。 Solr 甚至不需要许多操作的唯一键,唯一键可以是整数或字符串。
- Solr支持字段折叠(当前为只有附加补丁)以避免重复类似的结果。狮身人面像似乎没有提供任何这样的功能。
- Sphinx仅用于检索文档ID ,在Solr中,您可以直接获取具有几乎任何类型数据的整个文档,使其更加独立于任何外部数据存储,并且节省了额外的往返时间。
- Solr除了在嵌入式中使用外,还运行在 Java web容器,例如需要额外的特定配置和调整的Tomcat或Jetty(或者您可以使用包含Jetty的 ,并使用
java启动它-jar start.jar
)。狮身人面像没有额外的配置。
相关问题:
$ ul
I'm looking for a stand-alone full-text search server with the following properties:
- Must operate as a stand-alone server that can serve search requests from multiple clients
- Must be able to do "bulk indexing" by indexing the result of an SQL query: say "SELECT id, text_to_index FROM documents;"
- Must be free software and must run on Linux with MySQL as the database
- Must be fast (rules out MySQL's internal full-text search)
The alternatives I've found that have these properties are:
- Solr (based on Lucene)
- ElasticSearch (also based on Lucene)
- Sphinx
My questions:
- How do they compare?
- Have I missed any alternatives?
- I know that each use case is different, but are there certain cases where I would definitely not want to use a certain package?
I've been using Solr successfully for almost 2 years now, and have never used Sphinx, so I'm obviously biased. However, I'll try to keep it objective by quoting the docs or other people. I'll also take patches to my answer :-)
Similarities:
- Both Solr and Sphinx satisfy all of your requirements. They're fast and designed to index and search large bodies of data efficiently.
- Both have a long list of high-traffic sites using them (Solr, Sphinx)
- Both offer commercial support. (Solr, Sphinx)
- Both offer client API bindings for several platforms/languages (Sphinx, Solr)
- Both can be distributed to increase speed and capacity (Sphinx, Solr)
Here are some differences:
- Solr, being an Apache project, is obviously Apache2-licensed. Sphinx is GPLv2. This means that if you ever need to embed or extend (not just "use") Sphinx in a commercial application, you'll have to buy a commercial license (rationale)
- Solr is easily embeddable in Java applications.
- Solr is built on top of Lucene, which is a proven technology over 8 years old with a huge user base (this is only a small part). Whenever Lucene gets a new feature or speedup, Solr gets it too. Many of the devs committing to Solr are also Lucene committers.
- Sphinx integrates more tightly with RDBMSs, especially MySQL.
- Solr can be integrated with Hadoop to build distributed applications
- Solr can be integrated with Nutch to quickly build a fully-fledged web search engine with crawler.
- Solr can index proprietary formats like Microsoft Word, PDF, etc. Sphinx can't.
- Solr comes with a spell-checker out of the box.
- Solr comes with facet support out of the box. Faceting in Sphinx takes more work.
- Sphinx doesn't allow partial index updates for field data.
- In Sphinx, all document ids must be unique unsigned non-zero integer numbers. Solr doesn't even require an unique key for many operations, and unique keys can be either integers or strings.
- Solr supports field collapsing (currently as an additional patch only) to avoid duplicating similar results. Sphinx doesn't seem to provide any feature like this.
- While Sphinx is designed to only retrieve document ids, in Solr you can directly get whole documents with pretty much any kind of data, making it more independent of any external data store and it saves the extra roundtrip.
- Solr, except when used embedded, runs in a Java web container such as Tomcat or Jetty, which require additional specific configuration and tuning (or you can use the included Jetty and just launch it with
java -jar start.jar
). Sphinx has no additional configuration.
Related questions:
- Full Text Searching with Rails
- Comparison of full text search engine - Lucene, Sphinx, Postgresql, MySQL?
这篇关于选择独立的全文搜索服务器:Sphinx还是SOLR?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!