如何在加入/下推到外部服务器之前强制评估子查询 [英] How to force evaluation of subquery before joining / pushing down to foreign server

查看:75
本文介绍了如何在加入/下推到外部服务器之前强制评估子查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我想用几个 WHERE 过滤器查询一个大表。我正在使用Postgres 11和一个外部表;外部数据包装器(FDW)是 clickhouse_fdw 。但是我也对通用解决方案感兴趣。

Suppose I want to query a big table with a few WHERE filters. I am using Postgres 11 and a foreign table; foreign data wrapper (FDW) is clickhouse_fdw. But I am also interested in a general solution.

我可以这样做,如下所示:

I can do so as follows:

SELECT id,c1,c2,c3 from big_table where id=3 and c1=2

我的FDW能够对远程外部数据源进行过滤,确保上面的查询是快速的,并且不会提取太多数据。

My FDW is able to do the filtering on the remote foreign data source, ensuring that the above query is quick and doesn't pull down too much data.

如果我这样写,上面的工作原理是一样的:

The above works the same if I write:

SELECT id,c1,c2,c3 from big_table where id IN (3,4,5) and c1=2

即所有过滤都向下游发送。

I.e all of the filtering is sent downstream.

但是,如果我要进行的过滤稍微复杂一点:

However, if the filtering I'm trying to do is slightly more complex:

SELECT bt.id,bt.c1,bt.c2,bt.c3
from big_table bt
join lookup_table l on bt.id=l.id
where c1=2 and l.x=5

然后查询计划者决定根据 c1 = 2 远程,但在本地应用其他过滤器。

then the query planner decides to filter on c1=2 remotely but apply the other filter locally.

在我的用例中,计算哪个 id 首先具有 lx = 5 ,然后将其发送以进行远程过滤会更快,所以我尝试了可以这样写:

In my use case, calculating which ids have l.x=5 first and then sending those off to be filtered remotely will be much quicker, so I tried to write it the following way:

SELECT id,c1,c2,c3
from big_table
where c1=2
and id IN (select id from lookup_table where x=5)

,查询计划者仍决定对 big_table 中满足 c1 = 2 的所有结果进行本地第二过滤,这非常慢。

However, the query planner still decides to perform the second filter locally on ALL of the results from big_table that satisfy c1=2, which is very slow.

有什么方法可以强制 (从lookup_table中选择id,其中x = 5)是否要预先计算并作为远程过滤器的一部分发送?

Is there some way I can "force" (select id from lookup_table where x=5) to be pre-calculated and sent as part of a remote filter?

推荐答案

外国数据包装器



通常,联接或子查询或CTE的任何派生表在外部服务器上不可用,必须在本地执行。即,示例中简单的 WHERE 子句之后剩余的所有行都必须像您观察到的那样在本地进行检索和处理。

Foreign data wrapper

Typically, joins or any derived tables from subqueries or CTEs are not available on the foreign server and have to be executed locally. I.e., all rows remaining after the simple WHERE clause in your example have to be retrieved and processed locally like you observed.

如果所有其他方法均失败,则可以执行子查询从lookup_table WHERE x = 5 中选择ID并将结果连接到查询字符串中。

If all else fails you can execute the subquery SELECT id FROM lookup_table WHERE x = 5 and concatenate results into the query string.

更方便的是,您可以在PL / pgSQL函数中使用动态SQL和 EXECUTE 自动执行此操作。像这样:

More conveniently, you can automate this with dynamic SQL and EXECUTE in a PL/pgSQL function. Like:

CREATE OR REPLACE FUNCTION my_func(_c1 int, _l_id int)
   RETURNS TABLE(id int, c1 int, c2 int, c3 int) AS
$func$
BEGIN
   RETURN QUERY EXECUTE
     'SELECT id,c1,c2,c3 FROM big_table
      WHERE  c1 = $1
      AND    id = ANY ($2)'
   USING _c1
       , ARRAY(SELECT l.id FROM lookup_table l WHERE l.x = _l_id);
END
$func$  LANGUAGE plpgsql;

相关:

  • Table name as a PostgreSQL function parameter

或尝试在SO上进行搜索

或者您可以使用元命令 \ psql中的gexec 。请参阅:

Or you might use the meta-command \gexec in psql. See:

  • Filter column names from existing table for SQL DDL statement

或者这可能有用:(反馈说无效。)

SELECT id,c1,c2,c3
FROM   big_table
WHERE  c1 = 2
AND    id = ANY (ARRAY(SELECT id FROM lookup_table WHERE x = 5));

在本地测试,我得到这样的查询计划:

Testing locally, I get a query plan like this:

Index Scan using big_table_idx on big_table (cost= ...)
  Index Cond: (id = ANY ($0))
  Filter: (c1 = 2)
  InitPlan 1 (returns $0)
    ->  Seq Scan on lookup_table  (cost= ...)
          Filter: (x = 5)

加粗强调。

参数 $ 0

有关 postgres_fdw

  • postgres_fdw: possible to push data to foreign server for join?

这是一个不同的故事。只需使用CTE。

That's a different story. Just use a CTE. But I don't expect that to help with the FDW.

WITH cte AS (SELECT id FROM lookup_table WHERE x = 5)
SELECT id,c1,c2,c3
FROM   big_table b
JOIN   cte USING (id)
WHERE  b.c1 = 2;

PostgreSQL 12 的行为已更改(改进),因此可以内联CTE像子查询一样,有一些先决条件。但是,引用手册

PostgreSQL 12 changed (improved) behavior, so that CTEs can be inlined like subqueries, given some preconditions. But, quoting the manual:


您可以通过指定 MATERIALIZED 来强制执行WITH查询的单独计算

You can override that decision by specifying MATERIALIZED to force separate calculation of the WITH query

所以:

WITH cte AS MATERIALIZED (SELECT id FROM lookup_table WHERE x = 5)
...

通常,这些都不是必需的如果您的数据库服务器配置正确并且列统计信息是最新的。但是,有些极端情况下数据分布不均...

Typically, none of this should be necessary if your DB server is configured properly and column statistics are up to date. But there are corner cases with uneven data distribution ...

这篇关于如何在加入/下推到外部服务器之前强制评估子查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆