为什么规划师对具有不同波动性的函数提出不同的结果? [英] Why is the planner coming up with different results for functions with different volatilities?

查看:114
本文介绍了为什么规划师对具有不同波动性的函数提出不同的结果?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题是作为后续和 SQL函数与没有函数包装器的查询相比非常慢。我应该注意到,我不认为这是重复的,因为那个问题是要求解决具体问题。我在这里要求了解更多有关这种行为的信息,并展示它如何被复制。 (为了证明这种差异,您可以在我们讨论过的行为的接受答案上看到相当长的评论线索,而且我觉得这是脱节的话题,尤其是考虑到长度。)



我有一个功能。下面是一个展示感兴趣行为的示例:

  CREATE OR REPLACE FUNCTION test(INT)
RETURNS TABLE num INT,letter TEXT)
VOLATILE
LANGUAGE SQL
AS $$
SELECT *
FROM(VALUES(1,'a'),(2,'b '),(3,'c'),(4,'d'),(5,'e'))x
LIMIT $ 1
$$;

当我运行这个 EXPLAIN 时:

  EXPLAIN ANALYZE SELECT * FROM test(10); 

我在psql中得到了这个结果(我已经删除了一个巨大的Query Plan头文件):

 功能扫描测试(成本= 0.25..10.25行= 1000宽度= 36)(实际时间= 0.125..0.136行= 5循环= 1)
总运行时间:0.179毫秒
(2行)



注意行估计。它估计有1000行。



但是,如果我将函数更改为 STABLE IMMUTABLE



$ p $ CREATE OR REPLACE FUNCTION test(INT)
RETURNS TABLE(num INT, letter TEXT)
STABLE
语言SQL
AS $$
SELECT *
FROM(VALUES(1,'a'),(2,'b'), (3,'c'),(4,'d'),(5,'e'))x
LIMIT $ 1
$$;

然后,同样的 EXPLAIN 给了我一个不同的计划:

pre $ 限制(成本= 0.00..0.06行= 5宽度= 36)(实际时间= 0.010..0.050行= 5个循环= 1)
- >值在* VALUES *上扫描(成本= 0.00..0.06行= 5宽度= 36)(实际时间= 0.005..0.018行= 5个循环= 1)
总运行时间:0.087 ms
(3行)

现在它可以正确估计5行,并且它显示了包含在查询中的计划功能。成本要高一个数量级。运行时也下降了。 (查询太短,可能不是特别重要。)



鉴于处理更多数据的链接问题以及非常显着的性能差异,似乎计划者实际上是根据函数是否是 VOLATILE STABLE / IMMUTABLE



计划者在这里做什么,我可以在哪里阅读一些文档?
$ b

这些测试在PG 9.3中运行。

解决方案

1000行
$ b

1000 估计行数是 CREATE FUNCTION


execution_cost



该功能的预计执行成本n,以cpu_operator_cost为单位。如果函数返回
集合,则这是每个返回行的成本。如果未指定成本,则假定C语言和内部函数为
1单位,所有其他语言的函数为100个单位
。较大的值会导致计划人员
尝试避免更多次数地评估该函数。

result_rows



一个正数,给出计划器
应该预期函数返回的估计行数。当函数声明返回一个集合时,这只允许
。默认的假设
是1000行。

当一个函数声明为volatile时,它不要求内联,所以this默认值为 result_rows 成立。



另一方面,当它在第二个查询中被内联测试,行数将被估计,就好像函数的主体已被移入查询并且函数声明不存在一样。由于可以直接评估 VALUES 子句,因此可以在第二次测试中得出准确的估计值。



规划师在这里做什么,我在哪里可以阅读一些文档?

一般来说,规划师的优化战略不在主要文件中解释。他们在邮件列表中进行了讨论,并在源代码注释中提到,幸运的是这些评论非常清晰并且写得很好(与平均源代码相比)。在函数内联的情况下,我相信 inline_set_returning_functions inline_set_returning_function 揭示了驱动这一特定优化的大部分规则。 (警告:上面的链接进入当前的主分支,可随时更改或漂移)。


This question comes as a follow up to and a result of SQL function very slow compared to query without function wrapper. I should note that I don't consider this a duplicate, since that question was asking for a solution to a specific problem. I am asking for more information about the behavior in general here, and demonstrating how it can be reproduced. (To demonstrate the difference, you can see a fairly long comment thread on the accepted answer where we discussed the behavior, and I felt it was getting off topic, especially given the length.)

I have a function. Here's a sample one that exhibits the behavior of interest:

CREATE OR REPLACE FUNCTION test(INT)
  RETURNS TABLE(num INT, letter TEXT)
  VOLATILE
  LANGUAGE SQL
  AS $$
  SELECT *
  FROM (VALUES (1,'a'),(2,'b'),(3,'c'),(4,'d'),(5,'e')) x
  LIMIT $1
  $$;

When I run this EXPLAIN:

EXPLAIN ANALYZE SELECT * FROM test(10);

I get this result in psql (where I've removed a giant "Query Plan" header):

 Function Scan on test  (cost=0.25..10.25 rows=1000 width=36) (actual time=0.125..0.136 rows=5 loops=1)
 Total runtime: 0.179 ms
(2 rows)

Take note of the row estimation. It estimates 1000 rows.

But, if I change the function to STABLE or IMMUTABLE:

CREATE OR REPLACE FUNCTION test(INT)
  RETURNS TABLE(num INT, letter TEXT)
  STABLE
  LANGUAGE SQL
  AS $$
  SELECT *
  FROM (VALUES (1,'a'),(2,'b'),(3,'c'),(4,'d'),(5,'e')) x
  LIMIT $1
  $$;

Then the same EXPLAIN gives me a different plan:

 Limit  (cost=0.00..0.06 rows=5 width=36) (actual time=0.010..0.050 rows=5 loops=1)
   ->  Values Scan on "*VALUES*"  (cost=0.00..0.06 rows=5 width=36) (actual time=0.005..0.018 rows=5 loops=1)
 Total runtime: 0.087 ms
(3 rows)

Now it correctly estimates 5 rows, and it shows a plan for the query contained inside the function. The cost is an order of magnitude higher. The runtime went down as well. (The query is so short, that might not be especially significant.)

In light of the linked question dealing with much more data and having a very significant performance difference, it would seem that the planner is actually doing something different based on whether the function is VOLATILE or STABLE/IMMUTABLE.

What exactly is the planner doing here, and where can I read some documentation on it?

These tests were run in PG 9.3.

解决方案

It estimates 1000 rows

1000 estimated rows is a default value documented in CREATE FUNCTION:

execution_cost

A positive number giving the estimated execution cost for the function, in units of cpu_operator_cost. If the function returns a set, this is the cost per returned row. If the cost is not specified, 1 unit is assumed for C-language and internal functions, and 100 units for functions in all other languages. Larger values cause the planner to try to avoid evaluating the function more often than necessary.

result_rows

A positive number giving the estimated number of rows that the planner should expect the function to return. This is only allowed when the function is declared to return a set. The default assumption is 1000 rows.

When a function is declared volatile, it's asking to not be inlined, so this default value for result_rows holds.

On the other hand, when it's getting inlined in a query like in your second test, the number of rows is going to be estimated as if the body of the function had been moved into the query and the function declaration didn't exist. This leads in the second test to an exact estimate since the VALUES clause can be directly evaluated.

What exactly is the planner doing here, and where can I read some documentation on it?

In general, the planner's optimization strategies are not explained in the main documentation. They're discussed in mailing lists and mentioned in the source code comments, which fortunately tend to be exceptionally clear and well written (compared to average source code). In the case of function inlining, I believe the comments of inline_set_returning_functions and inline_set_returning_function reveal most of the rules driving this particular optimization. (warning: above links are into the current master branch which is subject to change or drift anytime).

这篇关于为什么规划师对具有不同波动性的函数提出不同的结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆