使用索引或位图索引扫描对时间戳进行高效的PostgreSQL查询? [英] Efficient PostgreSQL query on timestamp using index or bitmap index scan?

查看:200
本文介绍了使用索引或位图索引扫描对时间戳进行高效的PostgreSQL查询?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在PostgreSQL中,我在 ticket 表的日期字段上有一个索引。
当我将字段与 now()进行比较时,查询非常有效:

In PostgreSQL, I have an index on a date field on my tickets table. When I compare the field against now(), the query is pretty efficient:

# explain analyze select count(1) as count from tickets where updated_at > now();
                                                             QUERY PLAN                                                                  
---------------------------------------------------------------------------------------------------------------------------------------------
Aggregate  (cost=90.64..90.66 rows=1 width=0) (actual time=33.238..33.238 rows=1 loops=1)
   ->  Index Scan using tickets_updated_at_idx on tickets  (cost=0.01..90.27 rows=74 width=0) (actual time=0.016..29.318 rows=40250 loops=1)
         Index Cond: (updated_at > now())
Total runtime: 33.271 ms

如果我尝试,它会下坡并使用位图堆扫描将它与 now()进行比较减去一个间隔。

It goes downhill and uses a Bitmap Heap Scan if I try to compare it against now() minus an interval.

# explain analyze select count(1) as count from tickets where updated_at > (now() - '24 hours'::interval);
                                                                  QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate  (cost=180450.15..180450.17 rows=1 width=0) (actual time=543.898..543.898 rows=1 loops=1)
->  Bitmap Heap Scan on tickets  (cost=21296.43..175963.31 rows=897368 width=0) (actual time=251.700..457.916 rows=924373 loops=1)
     Recheck Cond: (updated_at > (now() - '24:00:00'::interval))
     ->  Bitmap Index Scan on tickets_updated_at_idx  (cost=0.00..20847.74 rows=897368 width=0)     (actual time=238.799..238.799 rows=924699 loops=1)
           Index Cond: (updated_at > (now() - '24:00:00'::interval))
Total runtime: 543.952 ms

是否有更有效的方法使用日期算术进行查询?

Is there a more efficient way to query using date arithmetic?

推荐答案

第一个查询期望找到 rows = 74 ,但实际上找到 rows = 40250

第二个查询期望找到 rows = 897368 并实际找到 rows = 924699

The 1st query expects to find rows=74, but actually finds rows=40250.
The 2nd query expects to find rows=897368 and actually finds rows=924699.

当然,处理23 x尽可能多的行需要相当多的时间。因此,您的实际时间并不令人惊讶。

Of course, processing 23 x as many rows takes considerably more time. So your actual times are not surprising.

updated_at>数据的统计数据now()已过时。运行:

ANALYZE tickets;

并重复查询。你认真拥有 updated_at>的数据。现在()?这听起来不对。

and repeat your queries. And you seriously have data with updated_at > now()? That sounds wrong.

然而,对于最近更改的数据,统计数据已经过时并不奇怪。那就是事物的逻辑。如果您的查询取决于当前的统计信息,则必须在运行查询之前运行 ANALYZE

It's not surprising, however, that statistics are outdated for data most recently changed. That's in the logic of things. If your query depends on current statistics, you have to run ANALYZE before you run your query.

同时测试(仅在您的会话中):

Also test with (in your session only):

SET enable_bitmapscan = off;

并重复第二个查询以查看没有位图索引扫描的时间。

and repeat your second query to see times without bitmap index scan.

普通的索引扫描按顺序从堆中获取行指数。这很简单,愚蠢而没有开销。快几行,但最终可能比行数越来越多的位图索引扫描更贵。

A plain index scan fetches rows from the heap sequentially as found in the index. That's simple, dumb and without overhead. Fast for few rows, but may end up more expensive than a bitmap index scan with a growing number of rows.

位图索引扫描收集查找表之前索引中的行。如果多个行驻留在同一数据页面上,则可以节省重复访问次数,并且可以大大加快速度。行数越多,机会越大,位图索引扫描将节省时间。

A bitmap index scan collects rows from the index before looking up the table. If multiple rows reside on the same data page, that saves repeated visits and can make things considerably faster. The more rows, the greater the chance, a bitmap index scan will save time.

对于更多行(约占表的5%,严重依赖于实际数据) ,计划程序切换到表的顺序扫描,根本不使用索引。

For even more rows (around 5% of the table, heavily depends on actual data), the planner switches to a sequential scan of the table and doesn't use the index at all.

最佳值为 Postgres 9.2中引入的仅索引扫描 。只有满足一些先决条件才有可能。如果索引中包含所有相关列,则索引类型支持它,并且可见性映射指示数据页上的所有行对所有事务都可见,该页不必从堆(表)中获取并且索引中的信息就足够了。

The optimum would be an index only scan, introduced with Postgres 9.2. That's only possible if some preconditions are met. If all relevant columns are included in the index, the index type support it and the visibility map indicates that all rows on a data page are visible to all transactions, that page doesn't have to be fetched from the heap (the table) and the information in the index is enough.

决定取决于你的统计数据(Postgres希望找到多少行及其分布)以及费用设置,最重要的是 random_page_cost cpu_index_tuple_cost effective_cache_size

The decision depends on your statistics (how many rows Postgres expects to find and their distribution) and on cost settings, most importantly random_page_cost, cpu_index_tuple_cost and effective_cache_size.

这篇关于使用索引或位图索引扫描对时间戳进行高效的PostgreSQL查询?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆