PostgresSQL 嵌套循环 - 规划器在执行 INNER JOIN 时何时决定使用嵌套循环? [英] PostgresSQL Nested Loops - When does the planner decide to use Nested Loop when doing an INNER JOIN?

查看:48
本文介绍了PostgresSQL 嵌套循环 - 规划器在执行 INNER JOIN 时何时决定使用嵌套循环?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 INNER JOIN 运行查询,其中规划器决定使用嵌套循环.我发现它与 WHERE 条件有关,因为我尝试使用不同的 WHERE 条件编写查询,因此它返回相同的结果但不使用嵌套循环.

I am running a query with an INNER JOIN where the planner decides to use a Nested Loop. I've figured out that it has do with the WHERE conditions as I have tried writing the query with different WHERE conditions so it returns the same result but does not use a Nested Loop.

我的问题是,当查询看起来相同,因为它们都返回相同的结果时,为什么规划器决定做出不同的决定?使用嵌套循环,查询在 77 秒内运行,在没有嵌套循环的情况下运行 13 秒,并且在 13 秒内运行的查询非常丑陋和不雅,让我认为有更好的编写方式.

My question is why has the planner decided to make the different decisions when the queries appear to be identical as they both return the same result? The query runs in 77 secs with the Nested Loop and in 13 sec without, and the query that runs in 13 sec is quite ugly and inelegant making me think there is a better way to write it.

这是两个查询.请注意,两者之间的区别在于 WHERE 子句如何按日期过滤,其中第一个使用 BETWEEN,第二个使用一系列 OR 语句.我知道 current_date 被包装在它们自己的子查询中很奇怪,但那是因为这些查询使用了外部数据包装器.这允许将 current_date 作为不可变对象传递以大大提高性能.

Here are the two queries. Note that the difference between the two is how the WHERE clause filters by date where the first uses BETWEEN and the second uses a series of OR statements. I am aware that it's strange that current_date is wrapped in their own subqueries but that is because these queries are using foreign data wrappers. This allows current_date to be passed as an immutable object to greatly speed up performance.

SELECT ROUND(AVG(m.forecast - w.wind),6) from pjm.wind_forecast_recent w
    INNER JOIN pjm.load_forecast_recent m ON w.pricedate = m.pricedate AND w.hour = m.hour
  WHERE w.hour = 5 AND m.area = 'RTO_COMBINED' AND 
            (w.pricedate BETWEEN (SELECT current_date-6) AND (SELECT current_date));

-----------

SELECT ROUND(AVG(m.forecast - w.wind),6) from pjm.wind_forecast_recent w
    INNER JOIN pjm.load_forecast_recent m ON w.pricedate = m.pricedate AND w.hour = m.hour
  WHERE w.hour = 5 AND m.area = 'RTO_COMBINED' AND (
    w.pricedate = (SELECT current_date-6) OR
    w.pricedate = (SELECT current_date-5) OR
    w.pricedate = (SELECT current_date-4) OR
    w.pricedate = (SELECT current_date-3) OR
    w.pricedate = (SELECT current_date-2) OR
    w.pricedate = (SELECT current_date-1) OR
    w.pricedate = (SELECT current_date))

这里是各自的解释分析:

And here are the respective EXPLAIN ANALYZE:

Aggregate  (cost=842341.01..842341.02 rows=1 width=32) (actual time=77120.088..77120.089 rows=1 loops=1)
  InitPlan 1 (returns $0)
    ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.007..0.008 rows=1 loops=1)
  InitPlan 2 (returns $1)
    ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)
  ->  Nested Loop  (cost=840333.25..842340.97 rows=1 width=18) (actual time=14719.661..77119.994 rows=7 loops=1)
        ->  Foreign Scan on wind_forecast_recent w  (cost=242218.45..242218.49 rows=1 width=18) (actual time=3184.714..3184.720 rows=7 loops=1)
        ->  Foreign Scan on load_forecast_recent m  (cost=598114.80..600122.47 rows=1 width=16) (actual time=10531.723..10531.724 rows=1 loops=7)
Planning Time: 744.979 ms
Execution Time: 77227.512 ms

Aggregate  (cost=841657.94..841657.95 rows=1 width=32) (actual time=13683.022..13683.023 rows=1 loops=1)
  InitPlan 1 (returns $0)
    ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.006..0.006 rows=1 loops=1)
  InitPlan 2 (returns $1)
    ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)
  InitPlan 3 (returns $2)
    ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)
  InitPlan 4 (returns $3)
    ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)
  InitPlan 5 (returns $4)
    ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)
  InitPlan 6 (returns $5)
    ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)
  InitPlan 7 (returns $6)
    ->  Result  (cost=0.00..0.01 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)
  ->  Foreign Scan  (cost=833725.15..841657.83 rows=1 width=18) (actual time=13682.974..13682.977 rows=7 loops=1)
        Relations: (pjm.wind_forecast_recent w) INNER JOIN (pjm.load_forecast_recent m)
Planning Time: 332.870 ms
JIT:
  Functions: 16
  Options: Inlining true, Optimization true, Expressions true, Deforming true
  Timing: Generation 4.163 ms, Inlining 15.088 ms, Optimization 44.489 ms, Emission 28.064 ms, Total 91.804 ms
Execution Time: 13724.094 ms

我在 Ubuntu 18.04 服务器上运行 PostgreSQL 12.1.

I am running PostgreSQL 12.1 on an Ubuntu 18.04 server.

如果您有任何其他问题,请告诉我.谢谢!

Let me know if you have any further questions. Thanks!

推荐答案

planner 并不是基于深层推理来决定使用某种连接策略,它只是构建所有可能的连接策略,估算成本并选择最便宜的一种.

The planner does not decide to use a certain join strategy based on deep reasoning, it simply constructs all possible join strategies, estimates the cost and chooses the cheapest one.

也就是说,如果外表很小,嵌套循环连接通常是最好的选择,这样内循环就不必经常执行.此外,内表连接条件的索引可以大大降低嵌套循环连接的成本,使其成为一种有吸引力的策略.

That said, nested loop joins are usually the best choice if the outer table is small, so that the inner loop does not have to be executed often. Also, an index on the join condition of the inner table can greatly reduce the cost of a nested loop join and make it an attractive strategy.

在你的情况下,错误的选择是由于估计错误:

In your case, the bad choice is due to a mis-estimate:

Foreign Scan on wind_forecast_recent w  (cost=... rows=1 ...) (actual ... rows=7 ...)

这会导致内循环执行 7 次而不是一次,因此执行时间是 70 秒而不是 10 秒.

That causes the inner loop to be executed 7 times rather than once, so that the execution time is 70 seconds rather than 10.

您应该收集 wind_forecast_recent 上的表统计信息:

You should collect table statistics on wind_forecast_recent:

ANALYZE wind_forecast_recent;

记住自动分析处理外部表;你必须自己照顾.

Remember that autoanalyze does not treat foreign tables; you have to take care of that yourself.

如果这不起作用,您可以尝试在外部表上设置 use_remote_estimate 选项,并确保远程数据库上的表统计信息准确.

If that doesn't do the trick, you can try setting the use_remote_estimate option on the foreign table and make sure that the table statistics are accurate on the remote database.

这篇关于PostgresSQL 嵌套循环 - 规划器在执行 INNER JOIN 时何时决定使用嵌套循环?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆