为什么Postgres进行顺序扫描,索引将返回< 1%的数据? [英] Why does Postgres do a sequential scan where the index would return < 1% of the data?

查看:125
本文介绍了为什么Postgres进行顺序扫描,索引将返回< 1%的数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有19年的Oracle和MySQL经验(DBA和dev),并且我是Postgres的新手,所以我可能缺少一些明显的东西。但是我无法使该查询执行我想要的操作。

I have 19 years of Oracle and MySQL experience (DBA and dev) and I am new to Postgres, so I may be missing something obvious. But I can not get this query to do what I want.

注意:此查询正在EngineYard Postgres实例上运行。我不立即知道它设置的参数。此外,items表中的apply_type和status列的扩展类型为citext。

以下查询可能需要60秒钟以上的时间才能返回行:

The following query can take in excess of 60 seconds to return rows:

SELECT items.item_id, 
       CASE when items.sku is null then items.title else concat(item.title, ' (SKU: ', items.sku, ')') END title, 
       items.listing_status, items.updated_at, items.id, 
       items.sku, count(details.id) detail_count 
FROM "items" LEFT OUTER JOIN details ON details.applicable_id = items.id 
                                    and details.applicable_type = 'Item' 
                                    and details.status = 'Valid' 
                LEFT OUTER JOIN products ON products.id = items.product_id
WHERE "items"."user_id" = 3
GROUP BY items.id
ORDER BY title asc
LIMIT 25 OFFSET 0

详细信息表包含650万行。 LEFT OUTER JOIN applicable_id 进行顺序扫描。从基数的角度来看,该列在650万行中有12万种不同的可能性。

The details table contains 6.5M rows. The LEFT OUTER JOIN to it does a sequential scan on applicable_id. Cardinality-wise, that column has 120K distinct possibilities across 6.5M rows.

我在详细信息上拥有btree索引并包含以下列:

I have a btree index on details with the following columns:

applicable_id
applicable_type
status

但实际上, applicable_id applicable_type 基数低。

我的解释分析看起来像这样:

Limit  (cost=247701.59..247701.65 rows=25 width=118) (actual time=28781.090..28781.098 rows=25 loops=1)
  ->  Sort  (cost=247701.59..247703.05 rows=585 width=118) (actual time=28781.087..28781.090 rows=25 loops=1)
      Sort Key: (CASE WHEN (items.sku IS NULL) THEN (items.title)::text ELSE pg_catalog.concat(items.title, ' (SKU: ', items.sku, ')') END)
      Sort Method: top-N heapsort  Memory: 30kB
      ->  HashAggregate  (cost=247677.77..247685.08 rows=585 width=118) (actual time=28779.658..28779.974 rows=664 loops=1)
          ->  Hash Right Join  (cost=2069.47..247645.64 rows=6425 width=118) (actual time=17798.898..28742.395 rows=60047 loops=1)
                Hash Cond: (details.applicable_id = items.id)
                ->  Seq Scan on details  (cost=0.00..220591.65 rows=6645404 width=8) (actual time=6.272..27702.717 rows=6646205 loops=1)
                      Filter: ((applicable_type = 'Listing'::citext) AND (status = 'Valid'::citext))
                      Rows Removed by Filter: 942
                ->  Hash  (cost=2062.16..2062.16 rows=585 width=118) (actual time=1.286..1.286 rows=664 loops=1)
                      Buckets: 1024  Batches: 1  Memory Usage: 90kB
                      ->  Bitmap Heap Scan on items  (cost=16.87..2062.16 rows=585 width=118) (actual time=0.157..0.748 rows=664 loops=1)
                            Recheck Cond: (user_id = 3)
                            ->  Bitmap Index Scan on index_items_on_user_id  (cost=0.00..16.73 rows=585 width=0) (actual time=0.141..0.141 rows=664 loops=1)
                                  Index Cond: (user_id = 3)

总运行时间:28781.238 ms

Total runtime: 28781.238 ms

推荐答案

在产生标题的表达式上有索引吗?更好的是,一个(user_id,title_expression)。

Do you have an index on the expression that yields the title? Better yet, one on (user_id, title_expression).

如果没有,那可能是一件好事,以便在索引的前25行嵌套循环扫描,发现Postgres无法合理地猜测将需要随机的25行(因此,您当前在连接表上进行的seq扫描)。

If not, that might be an excellent thing to add, so as to nestloop through the first 25 rows of an index scan, seeing that Postgres can't reasonably guess which random 25 rows (hence the seq scan you're currently getting on the joined table) will be needed.

这篇关于为什么Postgres进行顺序扫描,索引将返回< 1%的数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆