示例查询以显示PostgreSQL中的基数估计错误 [英] Sample Query to show Cardinality estimation error in PostgreSQL

查看:230
本文介绍了示例查询以显示PostgreSQL中的基数估计错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用PostgreSQL9.3进行项目。我正在使用以下查询来显示选择性估计错误如何导致使用PostgreSQL8.3在TPC-H工作负载上的查询执行时间成倍增加。

I am working on a project using PostgreSQL9.3. I was using the below query to show how selectivity estimation errors can lead to multi-fold increase in query execution time on TPC-H workload using PostgreSQL8.3.

select 
    n_name, 
    sum(l_extendedprice * (1 - l_discount)) as revenue 
from 
    customer, 
    orders, 
    lineitem, 
    supplier, 
    nation,
    region 
where 
    c_custkey = o_custkey 
    and l_orderkey = o_orderkey 
    and l_suppkey = s_suppkey   
    and c_nationkey = s_nationkey 
    and s_nationkey = n_nationkey 
    and n_regionkey = r_regionkey 
    and (r_name='ASIA' or r_name='AFRICA') 
    and o_orderdate >= date '1994-01-01' 
    and o_orderdate < date '1994-01-01' + interval '1 year' 
    and l_shipdate <= l_receiptdate 
    and l_commitdate <= l_shipdate + integer '90' 
    and l_extendedprice <= 20000 
    and c_name like '%r#00%' 
    and c_acctbal <=2400 
group by 
    n_name 
order by    
    revenue desc

问题在于PostgreSQL8.3选择的计划涉及很多NestedLoop联接,因为对行项和客户的选择性估计是大错特错。我认为这主要是由于LIKE模式匹配。

The problem was that PostgreSQL8.3 was choosing a plan which involves lot of NestedLoop joins since selectivity estimation on lineitem and customer were wrong by big margin. I think this was majorly due to LIKE pattern matching. But the optimal plan should have been using Hash Joins.

最近,我为Project升级到PostgreSQL9.3,并观察到上面的查询不再给出错误的计划。我花了一些时间尝试查找对TPC-H 1GB数据具有大基数估计错误的查询,但到目前为止没有成功。是否有任何PostgreSQL极客知道TPC-H基准上的现成查询或任何显示PostgreSQL9.3中基数估计错误的查询?

Recently I upgraded to PostgreSQL9.3 for my Project and observed that the above query no longer gives a bad plan. I spent some amount of time trying to find a query with large cardinality estimation error on TPC-H 1GB data with no success till now. Does any PostgreSQL geeks know some off the shelf query on TPC-H benchmark or any query to show cardinality estimation error in PostgreSQL9.3

推荐答案

这是为了回答由@发表的评论第十二以及问题本身。

This is to answer the comment by @Twelfth as well as the question itself.

手册中本章的三引号:

使用显式控制计划器 JOIN 子句


显式内部联接语法( INNER JOIN CROSS JOIN 或未经修饰的 JOIN
在语义上是相同的 FROM 中列出了输入关系,因此
不会限制连接顺序。

Explicit inner join syntax (INNER JOIN, CROSS JOIN, or unadorned JOIN) is semantically the same as listing the input relations in FROM, so it does not constrain the join order.

...


强制计划者遵循以下列出的加入顺序显式
JOIN s,将 join_collapse_limit 运行时参数设置为1。(其他
可能的值在下面讨论。)

To force the planner to follow the join order laid out by explicit JOINs, set the join_collapse_limit run-time parameter to 1. (Other possible values are discussed below.)

...


以这种方式约束规划者的搜索是一种有用的技术
,它既可以减少规划时间,又可以将规划者定向到
好​​的查询计划

加粗强调。相反,出于测试目的,您可以滥用相同的内容将查询计划器定向到查询计划。阅读整个手册页。

Bold emphasis mine. Conversely, you can abuse the same to direct the query planner to a bad query plan for your testing purposes. Read the whole manual page. It should be instrumental.

此外,您还可以通过停用替代方法,一个接一个地禁用(仅在您的会话中最佳)。像这样:

Also, you can force nested loops by disabling alternative methods one by one (best in your session only). Like:

SET enable_hashjoin = off;

等等。

关于检查和设置参数:

Etc.
About checking and setting parameters:

  • Query a parameter (postgresql.conf setting) like "max_connections"

一种明显的方法是禁用自动清理,然后从表中添加/删除行。然后,查询计划器将使用过时的统计信息。请注意,其他一些命令也会更新统计信息。

One obvious way would be to disable autovacuum and add / remove rows from the table. Then the query planner is working with outdated statistics. Note that some other commands update statistics as well.

统计信息存储在目录表 pg_class 和<$ c中$ c> pg_statistics 。

Statistics are stored in the catalog tables pg_class and pg_statistics.

SELECT * FROM pg_class WHERE oid = 'mytable'::regclass;
SELECT * FROM pg_statistic WHERE starelid = 'mytable'::regclass;

这使我想到了另一个选择。您可以在这两个表中伪造条目。需要超级用户特权。

您不会以新手的身份来打我,但要向公众发出警告:如果破坏目录表中的内容,数据库(集群)可能会瘫痪。您已被警告。

This leads me to another option. You could forge entries in these two tables. Superuser privileges required.
You don't strike me as a newcomer, but a warning for the general public: If you break something in the catalog tables, your database (cluster) might go belly-up. You have been warned.

这篇关于示例查询以显示PostgreSQL中的基数估计错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆