为什么Solr比Postgres快得多? [英] Why is Solr so much faster than Postgres?

查看：83 发布时间：2020/5/4 7:32:48 performance postgresql solr lucene rdbms

本文介绍了为什么Solr比Postgres快得多?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我最近从Postgres切换到Solr，发现查询速度提高了约50倍.我们运行的查询涉及多个范围，而我们的数据是车辆清单.例如:查找所有行驶里程小于50,000，价格小于5,000美元，价格小于10,000美元，make = Mazda ...的车辆"

I recently switched from Postgres to Solr and saw a ~50x speed up in our queries. The queries we run involve multiple ranges, and our data is vehicle listings. For example: "Find all vehicles with mileage < 50,000, $5,000 < price < $10,000, make=Mazda..."

我在Postgres的所有相关列上创建了索引，因此应该是一个相当公平的比较.在Postgres中查看查询计划，尽管它仍然只使用一个索引然后进行扫描(我想是因为它无法利用所有不同的索引).

I created indices on all the relevant columns in Postgres, so it should be a pretty fair comparison. Looking at the query plan in Postgres though it was still just using a single index and then scanning (I assume because it couldn't make use of all the different indices).

据我了解，Postgres和Solr使用模糊相似的数据结构(B树)，它们都在内存中缓存数据.所以我想知道如此大的性能差异来自何处.

As I understand it, Postgres and Solr use vaguely similar data structures (B-trees), and they both cache data in-memory. So I'm wondering where such a large performance difference comes from.

架构上的哪些差异可以解释这一点?

What differences in architecture would explain this?

First, Solr doesn't use B-trees. A Lucene (the underlying library used by Solr) index is made of a read-only segments. For each segment, Lucene maintains a term dictionary, which consists of the list of terms that appear in the segment, lexicographically sorted. Looking up a term in this term dictionary is made using a binary search, so the cost of a single-term lookup is O(log(t)) where t is the number of terms. On the contrary, using the index of a standard RDBMS costs O(log(d)) where d is the number of documents. When many documents share the same value for some field, this can be a big win.

此外，Lucene提交者Uwe Schindler添加了对性能卓越的数字字段，Lucene存储具有不同精度的多个值.这使Lucene可以非常有效地运行范围查询.由于您的用例似乎大量利用了数字范围查询，因此这可以解释为什么Solr这么快. (有关更多信息，请阅读非常有趣的javadocs，并提供指向相关研究论文的链接.)

Moreover, Lucene committer Uwe Schindler added support for very performant numeric range queries a few years ago. For every value of a numeric field, Lucene stores several values with different precisions. This allows Lucene to run range queries very efficiently. Since your use-case seems to leverage numeric range queries a lot, this may explain why Solr is so much faster. (For more information, read the javadocs which are very interesting and give links to relevant research papers.)

但是Solr只能这样做，因为它没有RDBMS具有的所有约束.例如，Solr很难一次更新一个文档(它更喜欢批量更新).

But Solr can only do this because it doesn't have all the constraints that a RDBMS has. For example, Solr is very bad at updating a single document at a time (it prefers batch updates).

这篇关于为什么Solr比Postgres快得多?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

为什么Solr比Postgres快得多? [英] Why is Solr so much faster than Postgres?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

为什么Solr比Postgres快得多? [英] Why is Solr so much faster than Postgres?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭