为什么Postgres不在简单的GROUP BY上使用索引? [英] Why is Postgres not using index on a simple GROUP BY?

查看：788 发布时间：2020/11/21 0:19:43 postgresql indexing group-by

本文介绍了为什么Postgres不在简单的GROUP BY上使用索引?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我创建了一个具有type列索引的3,600万行表:

I have created a 36M rows table with an index on type column:

CREATE TABLE items AS
  SELECT
    (random()*36000000)::integer AS id,
    (random()*10000)::integer AS type,
    md5(random()::text) AS s
  FROM
    generate_series(1,36000000);
CREATE INDEX items_type_idx ON items USING btree ("type");

我运行此简单查询，并期望postgresql使用我的索引:

I run this simple query and expect postgresql to use my index:

explain select count(*) from "items" group by "type";

但是查询计划者决定改为使用Seq扫描:

But the query planner decides to use Seq Scan instead:

HashAggregate  (cost=734592.00..734627.90 rows=3590 width=12) (actual time=6477.913..6478.344 rows=3601 loops=1)
  Group Key: type
  ->  Seq Scan on items  (cost=0.00..554593.00 rows=35999800 width=4) (actual time=0.044..1820.522 rows=36000000 loops=1)
Planning time: 0.107 ms
Execution time: 6478.525 ms

没有解释的时间: 5s 979ms

Time without EXPLAIN: 5s 979ms

我从此处尝试了几种解决方案和此处:

I have tried several solutions from here and here:

运行VACUUM ANALYZE或VACUUM ANALYZE
配置default_statistics_target，random_page_cost，work_mem

Run VACUUM ANALYZE or VACUUM ANALYZE
Configure default_statistics_target, random_page_cost, work_mem

但设置enable_seqscan = OFF没什么帮助:

SET enable_seqscan = OFF;
explain select count(*) from "items" group by "type";

GroupAggregate  (cost=0.56..1114880.46 rows=3590 width=12) (actual time=5.637..5256.406 rows=3601 loops=1)
  Group Key: type
  ->  Index Only Scan using items_type_idx on items  (cost=0.56..934845.56 rows=35999800 width=4) (actual time=0.074..2783.896 rows=36000000 loops=1)
        Heap Fetches: 0
Planning time: 0.103 ms
Execution time: 5256.667 ms

没有解释的时间: 659ms

在我的计算机上使用索引扫描进行查询的速度大约快10倍.

Query with index scan is about 10x faster on my machine.

有没有比设置enable_seqscan更好的解决方案?

Is there a better solution than setting enable_seqscan?

UPD1

我的postgresql版本是9.6.3，work_mem = 4MB(尝试64MB)，random_page_cost = 4(尝试1.1)，max_parallel_workers_per_gather = 0(尝试4).

My postgresql version is 9.6.3, work_mem = 4MB (tried 64MB), random_page_cost = 4 (tried 1.1), max_parallel_workers_per_gather = 0 (tried 4).

UPD2

我尝试不使用随机数来填充 type 列，而是使用i / 10000来填充pg_stats.correlation = 1-仍为seqscan.

I have tried to fill type column not with random numbers, but with i / 10000 to make pg_stats.correlation = 1 - still seqscan.

UPD3

@jgh是100％正确:

@jgh is 100% right:

通常仅在表的行宽比某些索引宽得多时才会发生

This typically only happens when the table's row width is much wider than some indexes

我制作了大列data，现在postgres使用了索引.谢谢大家！

I've made large column data and now postgres use index. Thanks everyone!

为什么Postgres不在简单的GROUP BY上使用索引? [英] Why is Postgres not using index on a simple GROUP BY?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

为什么Postgres不在简单的GROUP BY上使用索引? [英] Why is Postgres not using index on a simple GROUP BY?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭