什么时候考虑Solr [英] When to consider Solr

查看:123
本文介绍了什么时候考虑Solr的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发一个应用程序,该应用程序需要进行有趣的搜索,包括全文搜索,高亮显示,多面搜索等...

I am working on an application that needs to do interesting things with search, including full-text search, hit-highlighting, faceted-search, etc...

数据集可能在3000-10000条记录之间,每条记录上有20-30个字段,并且全部存储在MySQL中.该网站的流量概况可能是在较小的媒介上.

The dataset is likely to be between 3000-10000 records with 20-30 fields on each, and is all stored in MySQL. The traffic profile of the site is likely to be on the small size of medium.

所有这些要求都可以在MySQL中(轻松地)实现,但是在什么时候(就数据大小和流量级别而言),值得研究诸如Solr或Sphinx之类的重点技术吗?

All of these requirements could be achieved (clunkily) in MySQL, but at what point (in terms of data-size and traffic levels) does it become worth looking at more focused technologies like Solr or Sphinx?

推荐答案

此问题要求在所有方面都提供非常广泛的答案.在某些特殊情况下,有一些很好的规范可能会使一个系统优于另一个系统,但我想在这里介绍基础知识.

This question calls for a very broad answer to be answered in all aspects. There are very well certain specificas that may make one system superior to another for a special use case, but I want to cover the basics here.

我将完全以Solr为例,以几个功能大致相同的搜索引擎为例.

I will deal entirely with Solr as an example for several search engines that function roughly the same way.

我想从一些困难的事实入手:

I want to start with some hard facts:

  • 您不能依赖Solr/Lucene作为安全数据库.有一系列事实说明为什么,但是它们主要包括缺少的恢复选项,缺少事务,可能的复杂性等.如果决定使用solr,则需要从其他来源(如SQL表)填充索引.实际上,solr非常适合存储包含来自多个表和关系的数据的文档,否则将需要构造复杂的联接.

  • You cannot rely on Solr/Lucene as a secure database. There are a list of facts why but they mostly consist of missing recovery options, lack of acid transactions, possible complications etc. If you decide to use solr, you need to populate your index from another source like an SQL table. In fact solr is perfect for storing documents that include data from several tables and relations, that would otherwise requrie complex joins to be constructed.

Solr/Lucene提供令人难以置信的文本分析/词干分析/全文搜索评分/模糊功能.使用MySQL无法做到的事情.实际上,MySql中的全文本搜索仅限于MyIsam,评分非常琐碎且有限.加权字段,按特定度量增强文档,基于短语接近度对结果评分,匹配准确度等几乎是不可能的,很难的工作.

Solr/Lucene provides mind blowing text-analysis / stemming / full text search scoring / fuzziness functions. Things you just can not do with MySQL. In fact full text search in MySql is limited to MyIsam and scoring is very trivial and limited. Weighting fields, boosting documents on certain metrics, score results based on phrase proximity, matching accurazy etc is very hard work to almost impossible.

在Solr/Lucene中,您有文档.您不能真正存储关系和过程.好吧,您当然可以在某个文档的多值字段内索引其他文档的键,这样您就可以实际存储1:n关系,并通过两种方式获取n:n,但会增加数据开销.不要误会我的意思,它在很多方面都非常好用和高效(例如,对于某些产品目录,您要在其中存储产品的分销商,并且只想搜索某些分销商或某些机构可用的零件).但是,使用HAS/HAS NOT可以实现无限可能.您绝对不能做类似获得至少3个分销商可用的所有产品"之类的事情.

In Solr/Lucene you have documents. You cannot really store relations and process. Well you can of course index the keys of other documents inside a multivalued field of some document so this way you can actually store 1:n relations and do it both ways to get n:n, but its data overhead. Don't get me wrong, its perfectily fine and efficient for a lot of purposes (for example for some product catalog where you want to store the distributors for products and you want to search only parts that are available at certain distributors or something). But you reach the end of possibilities with HAS / HAS NOT. You can almonst not do something like "get all products that are available at at least 3 distributors".

Solr/Lucene具有非常好的分面功能和后期搜索分析.例如:在具有40000个匹配项的非常广泛的搜索之后,如果将搜索范围优化为具有该字段此值和该字段那个值的组合,则可以显示只有3次匹配.在MySQL中需要其他查询的内容可以高效且方便地完成.

Solr/Lucene has very nice facetting features and post search analysis. For example: After a very broad search that had 40000 hits you can display that you would only get 3 hits if you refined your search to the combination of having this field this value and that field that value. Stuff that need additional queries in MySQL is done efficiently and convinient.

让我们总结一下

  • Lucene的功能是文本搜索/分析.由于反向索引结构,它也使人头脑飞快.您实际上可以进行很多后期处理并满足其他需求.尽管它是面向文档的,并且没有像SPARQL的三重存储那样进行图形查询",但是基本的N:M关系是可以存储和查询的.如果您的应用程序专注于文本搜索,那么如果您没有很好的理由(例如非常复杂的多维范围过滤器查询),则绝对应该选择Solr/Lucene.

  • The power of Lucene is text searching/analyzing. It is also mind blowingly fast because of the reverse index structure. You can really do a lot of post processing and satisfy other needs. Altough it's document oriented and has no "graph querying" like triple stores do with SPARQL, basic N:M relations are possible to store and to query. If your application is focused on text searching you should definitely go for Solr/Lucene if you haven't good reasons, like very complex, multi-dmensional range filter queries, to do otherwise.

如果您没有进行文本搜索,而是可以指向并单击某些内容但不输入文本,那么好的老式关系数据库可能是一个更好的选择.

If you do not have text-search but rather something where you can point and click something but not enter text, good old relational databases are probably a better way to go.

这篇关于什么时候考虑Solr的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆