如果在该字段上构建b-tree索引,PostgreSQL如何执行ORDER BY? [英] How does PostgreSQL perform ORDER BY if a b-tree index is built on that field?

查看:132
本文介绍了如果在该字段上构建b-tree索引,PostgreSQL如何执行ORDER BY?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一张桌子 bsort

CREATE TABLE bsort(a int, data text);

此处数据可能不完整。换句话说,一些元组可能没有数据值。

Here data may be incomplete. In other words, some tuples may not have data value.

然后我构建一个b树索引表:

And then I build a b-tree index on the table:

CREATE INDEX ON bsort USING BTREE(a);

现在,如果我执行此查询:

Now if I perform this query:

SELECT * FROM bsort ORDER BY a;

PostgreSQL是否使用nlogn复杂性对元组进行排序,或者是否直接从b-tree索引获取订单?

Does PostgreSQL sort tuples with nlogn complexity, or does it get the order directly from the b-tree index?

推荐答案

对于像这样的简单查询,Postgres将使用 索引扫描 并按顺序从索引中检索随意排序的元组 。由于其 MVCC模型,Postgres必须始终 另外访问堆(数据页)以验证条目实际上是可见到当前事务。在仅索引扫描上引用 Postgres Wiki

For a simple query like this Postgres will use an index scan and retrieve readily sorted tuples from the index in order. Due to its MVCC model Postgres had to always visit the "heap" (data pages) additionally to verify entries are actually visible to the current transaction. Quoting the Postgres Wiki on index-only scans:


PostgreSQL索引不包含可见性信息。也就是说,
不能直接确定当前事务是否对
可见任何给定元组,这就是为什么它只花了这么长时间才能实现仅索引的
扫描。

PostgreSQL indexes do not contain visibility information. That is, it is not directly possible to ascertain if any given tuple is visible to the current transaction, which is why it has taken so long for index-only scans to be implemented.

最终发生在版本 9.2 仅索引扫描文档:

Which finally happened in version 9.2: index-only scans. The documentation:


如果索引存储原始索引数据值(而不是某些
有损表示),则支持仅索引
扫描很有用,其中索引返回堆元组的实际数据而不仅仅是
TID 。这仅在可见性图显示
TID 位于全部可见页面上时才有效;否则必须访问堆元组
以检查MVCC可见性。

If the index stores the original indexed data values (and not some lossy representation of them), it is useful to support index-only scans, in which the index returns the actual data not just the TID of the heap tuple. This will only work if the visibility map shows that the TID is on an all-visible page; else the heap tuple must be visited anyway to check MVCC visibility.

所以现在它取决于 <该表的href =http://www.postgresql.org/docs/current/interactive/storage-vm.html\"rel =nofollow noreferrer>可见性图 ,无论是索引 - 只能扫描。如果索引中包含所有涉及的列,则只有一个选项。否则,在任何情况下都必须访问堆(另外)。 仍然不需要排序步骤。

这就是为什么我们现在有时会将无用的列附加到索引中。与示例中的 data 列类似:

That's why we sometimes append otherwise useless columns to indexes now. Like the data column in your example:

CREATE INDEX ON bsort USING BTREE(a, data);

它使索引更大(取决于)并且维护和用于其他目的的成本更高一些不允许仅索引扫描。因此,如果只从中获取仅索引扫描,则只附加数据列。索引中列的顺序很重要:

It makes the index bigger (depends) and a bit more expensive to maintain and use for other purposes that do not allow an index-only scan. So only append the data column if you get index-only scans out of it. The order of columns in the index is important:

  • Working of indexes in PostgreSQL
  • Is a composite index also good for queries on the first field?

仅索引扫描的好处,每份文件:


如果知道页面上的所有元组都可见,则可以跳过堆获取
。这在大型数据集中最为明显,其中
可见性映射可以阻止磁盘访问。可见性映射比堆小很多
,因此即使堆
非常大,它也可以很容易地被缓存。

If it's known that all tuples on the page are visible, the heap fetch can be skipped. This is most noticeable on large data sets where the visibility map can prevent disk accesses. The visibility map is vastly smaller than the heap, so it can easily be cached even when the heap is very large.

可见性图由 VACUUM 如果你有 autovacuum 正在运行(现代Postgres中的默认设置)。详情:

The visibility map is maintained by VACUUM which happens automatically if you have autovacuum running (the default setting in modern Postgres). Details:

  • Are regular VACUUM ANALYZE still recommended under 9.1?

但是对表的写操作与下一个<$ c $之间存在一些延迟c> VACUUM 运行。它的要点:

But there is some delay between write operations to the table and the next VACUUM run. The gist of it:


  • 一旦真空吸尘,只读表就可以进行仅索引扫描了。

  • 已修改的数据页在可见性映射中丢失其全可见标志,直到下一个 VACUUM (以及所有旧的tansactions完成),因此它依赖于写入操作与 VACUUM 频率之间的比率。

  • Read-only tables stay ready for index-only scans once vacuumed.
  • Data pages that have been modified lose their "all-visible" flag in the visibility map until the next VACUUM (and all older tansactions being finished), so it depends on the ratio between write operations and VACUUM frequency.

仅部分索引扫描如果所涉及页面的某些标记为全部可见,则仍然可以。但是如果必须访问堆,则访问方法索引扫描会稍微便宜一些。因此,如果当前有太多页面变脏,Postgres将完全切换到更便宜的索引扫描。 再次发布Postgres Wiki

Partial index-only scans are still possible if some of the involved pages are marked all-visible. But if the heap has to be visited anyway, the access method "index scan" is a bit cheaper. So if too many pages are currently dirty, Postgres will switch to the cheaper index scan altogether. The Postgres Wiki again:


随着计划人员预计需要
的堆取料(或访问)数量增加,计划员最终将得出结论
表示不能进行仅索引扫描,因为根据其成本模型,它不是最便宜的
可能计划。仅索引
扫描的价值完全取决于它们是否允许我们忽略堆访问
(如果只是部分)并最小化I / O.

As the number of heap fetches (or "visits") that are projected to be needed by the planner goes up, the planner will eventually conclude that an index-only scan isn't desirable, as it isn't the cheapest possible plan according to its cost model. The value of index-only scans lies wholly in their potential to allow us to elide heap access (if only partially) and minimise I/O.

这篇关于如果在该字段上构建b-tree索引,PostgreSQL如何执行ORDER BY?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆