为什么Solr比Postgres快得多? [英] Why is Solr so much faster than Postgres?

查看:83
本文介绍了为什么Solr比Postgres快得多?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近从Postgres切换到Solr,发现查询速度提高了约50倍.我们运行的查询涉及多个范围,而我们的数据是车辆清单.例如:查找所有行驶里程小于50,000,价格小于5,000美元,价格小于10,000美元,make = Mazda ...的车辆"

I recently switched from Postgres to Solr and saw a ~50x speed up in our queries. The queries we run involve multiple ranges, and our data is vehicle listings. For example: "Find all vehicles with mileage < 50,000, $5,000 < price < $10,000, make=Mazda..."

我在Postgres的所有相关列上创建了索引,因此应该是一个相当公平的比较.在Postgres中查看查询计划,尽管它仍然只使用一个索引然后进行扫描(我想是因为它无法利用所有不同的索引).

I created indices on all the relevant columns in Postgres, so it should be a pretty fair comparison. Looking at the query plan in Postgres though it was still just using a single index and then scanning (I assume because it couldn't make use of all the different indices).

据我了解,Postgres和Solr使用模糊相似的数据结构(B树),它们都在内存中缓存数据.所以我想知道如此大的性能差异来自何处.

As I understand it, Postgres and Solr use vaguely similar data structures (B-trees), and they both cache data in-memory. So I'm wondering where such a large performance difference comes from.

架构上的哪些差异可以解释这一点?

What differences in architecture would explain this?

推荐答案

首先,Solr不使用B树. Lucene(Solr使用的基础库)索引由只读的网段.对于每个段,Lucene维护一个术语词典,该词典由按段分类的出现在段中的术语列表组成.使用二进制搜索在此术语词典中查找术语,因此单项查找的成本为O(log(t)),其中t是术语数.相反,使用标准RDBMS的索引成本O(log(d)),其中d是文档数.当许多文档在某个领域具有相同的价值时,这可能是一个大胜利.

First, Solr doesn't use B-trees. A Lucene (the underlying library used by Solr) index is made of a read-only segments. For each segment, Lucene maintains a term dictionary, which consists of the list of terms that appear in the segment, lexicographically sorted. Looking up a term in this term dictionary is made using a binary search, so the cost of a single-term lookup is O(log(t)) where t is the number of terms. On the contrary, using the index of a standard RDBMS costs O(log(d)) where d is the number of documents. When many documents share the same value for some field, this can be a big win.

此外,Lucene提交者Uwe Schindler添加了对性能卓越的数字字段,Lucene存储具有不同精度的多个值.这使Lucene可以非常有效地运行范围查询.由于您的用例似乎大量利用了数字范围查询,因此这可以解释为什么Solr这么快. (有关更多信息,请阅读非常有趣的javadocs,并提供指向相关研究论文的链接.)

Moreover, Lucene committer Uwe Schindler added support for very performant numeric range queries a few years ago. For every value of a numeric field, Lucene stores several values with different precisions. This allows Lucene to run range queries very efficiently. Since your use-case seems to leverage numeric range queries a lot, this may explain why Solr is so much faster. (For more information, read the javadocs which are very interesting and give links to relevant research papers.)

但是Solr只能这样做,因为它没有RDBMS具有的所有约束.例如,Solr很难一次更新一个文档(它更喜欢批量更新).

But Solr can only do this because it doesn't have all the constraints that a RDBMS has. For example, Solr is very bad at updating a single document at a time (it prefers batch updates).

这篇关于为什么Solr比Postgres快得多?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆