Postgres选择BTREE而不是BRIN索引 [英] Postgres choosing BTREE instead of BRIN index

查看:414
本文介绍了Postgres选择BTREE而不是BRIN索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行Postgres 9.5并正在玩BRIN索引。我有一个约1.5亿行的事实表,我试图让PG使用BRIN索引。我的查询是:

I'm running Postgres 9.5 and am playing around with BRIN indexes. I have a fact table with about 150 million rows and I'm trying to get PG to use a BRIN index. My query is:

select sum(transaction_amt), 
       sum (total_amt) 
from fact_transaction 
where transaction_date_key between 20170101 and 20170201 

我创建了BTREE索引和BRIN索引(默认pages_per_range列transaction_date_key的值为128)(上述查询指的是2017年1月至2月)。我原以为PG会选择使用BRIN索引,但它会与BTREE索引一致。以下是解释计划:

I created both a BTREE index and a BRIN index (default pages_per_range value of 128) on column transaction_date_key (the above query is referring to January to February 2017). I would have thought that PG would choose to use the BRIN index however it goes with the BTREE index. Here is the explain plan:

https:// explain.depesz.com/s/uPI

然后我删除了BTREE索引,对表进行了真空/分析,并重新运行了查询并确实选择BRIN索引,但运行时间要长得多:

I then deleted the BTREE index, did a vacuum / analyze on the the table, and re-ran the query and it did choose the BRIN index however the run time was considerably longer:

https://explain.depesz.com/s/5VXi

实际上我的测试是使用BTREE索引而不是BRIN索引时更快。我认为它应该是相反的?

In fact my tests were all faster when using the BTREE index rather than the BRIN index. I thought it was supposed to be the opposite?

我更喜欢使用BRIN索引,因为它的尺寸较小但我似乎无法使用PG它。

I'd prefer to use the BRIN index because of its smaller size however I can't seem to get PG to use it.

注意:我从2017年1月开始到2017年6月(通过transaction_date_key定义)加载数据,因为我读到物理表排序在使用BRIN索引时有所不同。

Note: I loaded the data, starting from January 2017 through to June 2017 (defined via transaction_date_key) as I read that physical table ordering makes a difference when using BRIN indexes.

有谁知道为什么PG选择使用BTREE索引以及为什么BRIN在我的情况下要慢得多?

Does anyone know why PG is choosing to use the BTREE index and why BRIN is so much slower in my case?

推荐答案

似乎BRIN索引扫描不是很有选择性–它返回3000万行,所有这些都必须重新检查,这是花费时间的地方。

It seems like the BRIN index scan is not very selective – it returns 30 million rows, all of which have to be re-checked, which is where the time is spent.

这可能意味着 transaction_date_key 与表中行的物理位置没有很好的相关性。

That probably means that transaction_date_key is not well correlated with the physical location of the rows in the table.

A BRIN索引通过lumping together&rdquo工作表块的范围(可以使用存储参数 pages_per_range 配置多少个,其默认值为128)。存储块的eatch范围的索引值的最大值和最小值。

A BRIN index works by “lumping together” ranges of table blocks (how many can be configured with the storage parameter pages_per_range, whose default value is 128). The maximum and minimum of the indexed value for eatch range of blocks is stored.

因此,表中的许多块范围包含 transaction_date_key 20170101 20170201 之间,并且必须扫描所有这些块以计算查询结果。

So a lot of block ranges in your table contain transaction_date_key between 20170101 and 20170201, and all of these blocks have to be scanned to compute the query result.

我看到两种方法可以改善这种情况:

I see two options to improve the situation:


  • 降低 pages_per_range 存储参数。这将使指数更大,但它会减少误报的数量。块。

  • Lower the pages_per_range storage parameter. That will make the index bigger, but it will reduce the number of “false positive” blocks.

transaction_date_key 属性上对表进行集群。如您所知,这需要(至少暂时)列上的B树索引。

Cluster the table on the transaction_date_key attribute. As you have found out, that requires (at least temporarily) a B-tree index on the column.

这篇关于Postgres选择BTREE而不是BRIN索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆