为什么规划师对具有不同波动性的函数提出不同的结果? [英] Why is the planner coming up with different results for functions with different volatilities?
问题描述
这个问题是作为后续和 SQL函数与没有函数包装器的查询相比非常慢。我应该注意到,我不认为这是重复的,因为那个问题是要求解决具体问题。我在这里要求了解更多有关这种行为的信息,并展示它如何被复制。 (为了证明这种差异,您可以在我们讨论过的行为的接受答案上看到相当长的评论线索,而且我觉得这是脱节的话题,尤其是考虑到长度。)
我有一个功能。下面是一个展示感兴趣行为的示例:
CREATE OR REPLACE FUNCTION test(INT)
RETURNS TABLE num INT,letter TEXT)
VOLATILE
LANGUAGE SQL
AS $$
SELECT *
FROM(VALUES(1,'a'),(2,'b '),(3,'c'),(4,'d'),(5,'e'))x
LIMIT $ 1
$$;
当我运行这个 EXPLAIN
时:
EXPLAIN ANALYZE SELECT * FROM test(10);
我在psql中得到了这个结果(我已经删除了一个巨大的Query Plan头文件):
功能扫描测试(成本= 0.25..10.25行= 1000宽度= 36)(实际时间= 0.125..0.136行= 5循环= 1)
总运行时间:0.179毫秒
(2行)
注意行估计。它估计有1000行。
但是,如果我将函数更改为 STABLE
或 IMMUTABLE
:
$ p $
CREATE OR REPLACE FUNCTION test(INT)
RETURNS TABLE(num INT, letter TEXT)
STABLE
语言SQL
AS $$
SELECT *
FROM(VALUES(1,'a'),(2,'b'), (3,'c'),(4,'d'),(5,'e'))x
LIMIT $ 1
$$;
然后,同样的 EXPLAIN
给了我一个不同的计划:
pre $ 限制(成本= 0.00..0.06行= 5宽度= 36)(实际时间= 0.010..0.050行= 5个循环= 1)
- >值在* VALUES *上扫描(成本= 0.00..0.06行= 5宽度= 36)(实际时间= 0.005..0.018行= 5个循环= 1)
总运行时间:0.087 ms
(3行)
现在它可以正确估计5行,并且它显示了包含在查询中的计划功能。成本要高一个数量级。运行时也下降了。 (查询太短,可能不是特别重要。)
鉴于处理更多数据的链接问题以及非常显着的性能差异,似乎计划者实际上是根据函数是否是 VOLATILE
或 STABLE
/ IMMUTABLE
。
计划者在这里做什么,我可以在哪里阅读一些文档?
$ b
这些测试在PG 9.3中运行。
1000行 execution_cost 该功能的预计执行成本n,以cpu_operator_cost为单位。如果函数返回 result_rows 一个正数,给出计划器 当一个函数声明为volatile时,它不要求内联,所以this默认值为 另一方面,当它在第二个查询中被内联测试,行数将被估计,就好像函数的主体已被移入查询并且函数声明不存在一样。由于可以直接评估 规划师在这里做什么,我在哪里可以阅读一些文档? 。 一般来说,规划师的优化战略不在主要文件中解释。他们在邮件列表中进行了讨论,并在源代码注释中提到,幸运的是这些评论非常清晰并且写得很好(与平均源代码相比)。在函数内联的情况下,我相信 inline_set_returning_functions 和 inline_set_returning_function 揭示了驱动这一特定优化的大部分规则。 (警告:上面的链接进入当前的主分支,可随时更改或漂移)。 This question comes as a follow up to and a result of SQL function very slow compared to query without function wrapper. I should note that I don't consider this a duplicate, since that question was asking for a solution to a specific problem. I am asking for more information about the behavior in general here, and demonstrating how it can be reproduced. (To demonstrate the difference, you can see a fairly long comment thread on the accepted answer where we discussed the behavior, and I felt it was getting off topic, especially given the length.) I have a function. Here's a sample one that exhibits the behavior of interest: When I run this I get this result in psql (where I've removed a giant "Query Plan" header): Take note of the row estimation. It estimates 1000 rows. But, if I change the function to Then the same Now it correctly estimates 5 rows, and it shows a plan for the query contained inside the function. The cost is an order of magnitude higher. The runtime went down as well. (The query is so short, that might not be especially significant.) In light of the linked question dealing with much more data and having a very significant performance difference, it would seem that the planner is actually doing something different based on whether the function is What exactly is the planner doing here, and where can I read some documentation on it? These tests were run in PG 9.3. It estimates 1000 rows execution_cost A positive number giving the estimated execution cost for the function, in units of cpu_operator_cost. If the function returns a
set, this is the cost per returned row. If the cost is not specified,
1 unit is assumed for C-language and internal functions, and 100 units
for functions in all other languages. Larger values cause the planner
to try to avoid evaluating the function more often than necessary. result_rows A positive number giving the estimated number of rows that the planner
should expect the function to return. This is only allowed
when the function is declared to return a set. The default assumption
is 1000 rows. When a function is declared volatile, it's asking to not be inlined, so this default value for On the other hand, when it's getting inlined in a query like in your second test, the number of rows is going to be estimated as if the body of the function had been moved into the query and the function declaration didn't exist. This leads in the second test to an exact estimate since the What exactly is the planner doing here, and where can I read some documentation on it? In general, the planner's optimization strategies are not explained in the main documentation. They're discussed in mailing lists and mentioned in the source code comments, which fortunately tend to be exceptionally clear and well written (compared to average source code). In the case of function inlining, I believe the comments of inline_set_returning_functions and inline_set_returning_function reveal most of the rules driving this particular optimization. (warning: above links are into the current master branch which is subject to change or drift anytime). 这篇关于为什么规划师对具有不同波动性的函数提出不同的结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
$ b 1000
估计行数是 CREATE FUNCTION
:
集合,则这是每个返回行的成本。如果未指定成本,则假定C语言和内部函数为
1单位,所有其他语言的函数为100个单位
。较大的值会导致计划人员
尝试避免更多次数地评估该函数。
应该预期函数返回的估计行数。当函数声明返回一个集合时,这只允许
。默认的假设
是1000行。
result_rows
成立。
VALUES
子句,因此可以在第二次测试中得出准确的估计值。
CREATE OR REPLACE FUNCTION test(INT)
RETURNS TABLE(num INT, letter TEXT)
VOLATILE
LANGUAGE SQL
AS $$
SELECT *
FROM (VALUES (1,'a'),(2,'b'),(3,'c'),(4,'d'),(5,'e')) x
LIMIT $1
$$;
EXPLAIN
:EXPLAIN ANALYZE SELECT * FROM test(10);
Function Scan on test (cost=0.25..10.25 rows=1000 width=36) (actual time=0.125..0.136 rows=5 loops=1)
Total runtime: 0.179 ms
(2 rows)
STABLE
or IMMUTABLE
:CREATE OR REPLACE FUNCTION test(INT)
RETURNS TABLE(num INT, letter TEXT)
STABLE
LANGUAGE SQL
AS $$
SELECT *
FROM (VALUES (1,'a'),(2,'b'),(3,'c'),(4,'d'),(5,'e')) x
LIMIT $1
$$;
EXPLAIN
gives me a different plan: Limit (cost=0.00..0.06 rows=5 width=36) (actual time=0.010..0.050 rows=5 loops=1)
-> Values Scan on "*VALUES*" (cost=0.00..0.06 rows=5 width=36) (actual time=0.005..0.018 rows=5 loops=1)
Total runtime: 0.087 ms
(3 rows)
VOLATILE
or STABLE
/IMMUTABLE
.1000
estimated rows is a default value documented in CREATE FUNCTION
:
result_rows
holds.VALUES
clause can be directly evaluated.