全文搜索引擎比较 - Lucene,Sphinx,Postgresql,MySQL? [英] Comparison of full text search engine - Lucene, Sphinx, Postgresql, MySQL?

查看:110
本文介绍了全文搜索引擎比较 - Lucene,Sphinx,Postgresql,MySQL?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



几位候选人:


  • 使用Compass / Solr的Lucene / Lucene

  • Sphinx b
  • Postgresql内置全文搜索


  • MySQl内置全文搜索功能




选择标准:


  • 结果相关性和排名

  • 搜索和索引速度

  • 易用性和易于与Django集成

  • 资源需求 - 网站将托管在 VPS ,因此理想的搜索引擎不需要大量的RAM和CPU

  • 可扩展性

  • 额外的功能,例如您的意思是?,相关的搜索等等


任何对上述搜索引擎或其他引擎都不在列表中的经历的人 - 我很乐意听到您的声音意见。编辑:至于索引需求,随着用户不断输入数据到网站,这些数据将需要不断索引。它不一定是实时的,但理想情况下,新数据将在索引中显示,延迟时间不得超过15-30分钟。

解决方案



另一方面,狮身人面像,我知道的很好,所以让我们来看看吧。看看我能否有所帮助。




  • 结果相关性排名是默认值。您可以根据需要设置自己的排序方式,并给予特定字段更高的权重。

  • 索引速度超快,因为它直接与数据库通信。任何缓慢都将来自复杂的SQL查询和未索引的外键以及其他此类问题。我从来没有注意到任何搜索缓慢。

  • 我是Rails的人,所以我不知道用Django实现是多么容易。虽然有一个Python API可以与Sphinx源代码一起提供。

  • 搜索服务守护进程(searchd)在内存使用上相当低 - 您可以对索引程序进程也使用了多少内存

  • 可伸缩性是我的知识更粗略的地方 - 但将索引文件复制到多台机器并运行多个searchd守护程序非常简单。我从其他人那里得到的一般印象是,它在高负载下非常好,所以在多台机器上扩展它不是需要处理的。 没有支持'没有意思'等等 - 虽然这些可以很容易地用其他工具完成。狮身人面像虽然使用字典来干扰词汇,所以'驾驶'和'驱动'(例如)在搜索中会被视为相同。

  • Sphinx不允许部分索引更新字段数据虽然。对此常见的做法是维护一个包含所有最近更改的增量指数,并在每次更改后对其重新进行索引(并且这些新的结果会在一两秒内出现)。由于数据量小,这可能需要几秒钟的时间。尽管如此,您仍然需要定期对主要数据集进行重新索引(尽管每隔一小时每隔一段时间如何定期依赖于数据的波动性)。尽管快速索引速度让这一切都变得非常轻松。



我不知道这是多么适用于您的情况,但 Evan Weaver比较了几个常见的Rails搜索选项( Sphinx,Ferret(用于Ruby的Lucene的一个端口)和Solr),运行一些基准测试。我猜可能是有用的。



我没有深入探究MySQL全文搜索的深度,但我知道它不会在速度方面或功能方面展开竞争,与狮身人面像,Lucene或Solr明智。


I'm building a Django site and I am looking for a search engine.

A few candidates:

  • Lucene/Lucene with Compass/Solr

  • Sphinx

  • Postgresql built-in full text search

  • MySQl built-in full text search

Selection criteria:

  • result relevance and ranking
  • searching and indexing speed
  • ease of use and ease of integration with Django
  • resource requirements - site will be hosted on a VPS, so ideally the search engine wouldn't require a lot of RAM and CPU
  • scalability
  • extra features such as "did you mean?", related searches, etc

Anyone who has had experience with the search engines above, or other engines not in the list -- I would love to hear your opinions.

EDIT: As for indexing needs, as users keep entering data into the site, those data would need to be indexed continuously. It doesn't have to be real time, but ideally new data would show up in index with no more than 15 - 30 minutes delay

解决方案

Good to see someone's chimed in about Lucene - because I've no idea about that.

Sphinx, on the other hand, I know quite well, so let's see if I can be of some help.

  • Result relevance ranking is the default. You can set up your own sorting should you wish, and give specific fields higher weightings.
  • Indexing speed is super-fast, because it talks directly to the database. Any slowness will come from complex SQL queries and un-indexed foreign keys and other such problems. I've never noticed any slowness in searching either.
  • I'm a Rails guy, so I've no idea how easy it is to implement with Django. There is a Python API that comes with the Sphinx source though.
  • The search service daemon (searchd) is pretty low on memory usage - and you can set limits on how much memory the indexer process uses too.
  • Scalability is where my knowledge is more sketchy - but it's easy enough to copy index files to multiple machines and run several searchd daemons. The general impression I get from others though is that it's pretty damn good under high load, so scaling it out across multiple machines isn't something that needs to be dealt with.
  • There's no support for 'did-you-mean', etc - although these can be done with other tools easily enough. Sphinx does stem words though using dictionaries, so 'driving' and 'drive' (for example) would be considered the same in searches.
  • Sphinx doesn't allow partial index updates for field data though. The common approach to this is to maintain a delta index with all the recent changes, and re-index this after every change (and those new results appear within a second or two). Because of the small amount of data, this can take a matter of seconds. You will still need to re-index the main dataset regularly though (although how regularly depends on the volatility of your data - every day? every hour?). The fast indexing speeds keep this all pretty painless though.

I've no idea how applicable to your situation this is, but Evan Weaver compared a few of the common Rails search options (Sphinx, Ferret (a port of Lucene for Ruby) and Solr), running some benchmarks. Could be useful, I guess.

I've not plumbed the depths of MySQL's full-text search, but I know it doesn't compete speed-wise nor feature-wise with Sphinx, Lucene or Solr.

这篇关于全文搜索引擎比较 - Lucene,Sphinx,Postgresql,MySQL?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆