我可以将一个查询拆分为多个查询,还是可以创建并行性来加快查询速度? [英] Can I split a query in multiple queries or create parallelism to speed a query?

查看:81
本文介绍了我可以将一个查询拆分为多个查询,还是可以创建并行性来加快查询速度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一张桌子avl_pool,我有一个功能可以在地图上找到最接近该(x, y)位置的链接.

I have a table avl_pool, and I have a function to find on the map the link nearest to that (x, y) position.

此选择的性能非常线性,该函数需要〜8 ms的时间才能执行.因此,计算此选择1000行需要8秒.或者如我在此示例中所示,20.000行需要162秒.

The performance of this select is very linear, the function require ~8 ms to execute. So calculate this select for 1000 rows require 8 seconds. Or as I show in this sample 20.000 rows require 162 seconds.

SELECT avl_id, x, y, azimuth, map.get_near_link(X, Y, AZIMUTH)
FROM avl_db.avl_pool         
WHERE avl_id between 1 AND 20000

"Index Scan using avl_pool_pkey on avl_pool  (cost=0.43..11524.76 rows=19143 width=28) (actual time=8.793..162805.384 rows=20000 loops=1)"
"  Index Cond: ((avl_id >= 1) AND (avl_id <= 20000))"
"  Buffers: shared hit=19879838"
"Planning time: 0.328 ms"
"Execution time: 162812.113 ms"

使用pgAdmin,我发现如果同时在单独的窗口上执行范围的一半,则执行时间实际上会分成两半.因此,看起来服务器可以毫无问题地处理对同一表/函数的多个请求.

Using pgAdmin I found out if execute half of the range on separated windows at the same time, the execution time is actually split in half. So looks like the server can handle multiple requests to that same table/function without problem.

-- windows 1
SELECT avl_id, x, y, azimuth, map.get_near_link(X, Y, AZIMUTH)
FROM avl_db.avl_pool         
WHERE avl_id between 1 AND 10000 

Total query runtime: 83792 ms.

-- windows 2
SELECT avl_id, x, y, azimuth, map.get_near_link(X, Y, AZIMUTH)
FROM avl_db.avl_pool         
WHERE avl_id between 10001 AND 20000

Total query runtime: 84047 ms.

那么我应该如何改善这种情况呢?

So how should I aproach this scenario to improve performance?.

C#方法讲,我想我可以创建多个线程,每个线程发送一部分范围,然后将所有数据加入客户端.因此,用20k和162秒执行一次查询,我可以发送10条具有2000行的查询并在约16秒内完成.当然,联接中可能会有开销,但与160秒相比应该不算太大.

From the C# aproach I guess I can create multiple threads and each one send a portion of the range and then I join all the data in the client. So instead one query with 20k and 162 seconds, I could send 10 querys with 2000 rows and finish in ~16 seconds. Of course maybe there is an overhead cost in the join, but shouldn't be big compared with the 160 seconds.

还是有一个我应该考虑的其他方法,如果只是一个sql解决方案,那就更好了吗?

Or is there is a different aproach I should consider, even better if is a just sql solution?

@PeterRing我认为功能代码不相关,但是无论如何这里都是如此.

@PeterRing I dont think function code is relevant but anyway here is.

CREATE OR REPLACE FUNCTION map.get_near_link(
    x NUMERIC,
    y NUMERIC,
    azim NUMERIC)
  RETURNS map.get_near_link AS
$BODY$
DECLARE
    strPoint TEXT;
    sRow map.get_near_link;
  BEGIN
    strPoint = 'POINT('|| X || ' ' || Y || ')';
    RAISE DEBUG 'GetLink strPoint % -- Azim %', strPoint, Azim;

    WITH index_query AS (
        SELECT --Seg_ID,
               Link_ID,
               azimuth,
               TRUNC(ST_Distance(ST_GeomFromText(strPoint,4326), geom  )*100000)::INTEGER AS distance,
               sentido,
               --ST_AsText(geom),
               geom
        FROM map.vzla_seg S
        WHERE
            ABS(Azim - S.azimuth) < 30 OR
            ABS(Azim - S.azimuth) > 330
        ORDER BY
            geom <-> ST_GeomFromText(strPoint, 4326)
        LIMIT 101
    )
    SELECT i.Link_ID, i.Distance, i.Sentido, v.geom INTO sRow
    FROM
        index_query i INNER JOIN
        map.vzla_rto v ON i.link_id = v.link_id
    ORDER BY
        distance LIMIT 1;

    RAISE DEBUG 'GetLink distance % ', sRow.distance;
    IF sRow.distance > 50 THEN
        sRow.link_id = -1;
    END IF;

    RETURN sRow;
  END;
$BODY$
  LANGUAGE plpgsql IMMUTABLE
  COST 100;
ALTER FUNCTION map.get_near_link(NUMERIC, NUMERIC, NUMERIC)
  OWNER TO postgres;

推荐答案

请考虑将您的map.get_near_link函数标记为

Consider marking your map.get_near_link function as PARALLEL SAFE. This will tell the database engine that it is allowed to try generate a parallel plan when executing the function:

PARALLEL UNSAFE指示无法在以下位置执行该功能 并行模式以及此类函数在SQL语句中的存在 强制执行串行执行计划.这是默认值.平行线 RESTRICTED表示该函数可以并行执行 模式,但执行仅限于并行组长. PARALLEL SAFE表示该函数可以安全并行运行 模式不受限制.

PARALLEL UNSAFE indicates that the function can't be executed in parallel mode and the presence of such a function in an SQL statement forces a serial execution plan. This is the default. PARALLEL RESTRICTED indicates that the function can be executed in parallel mode, but the execution is restricted to parallel group leader. PARALLEL SAFE indicates that the function is safe to run in parallel mode without restriction.

有几种设置可能导致查询计划者在任何情况下都不生成并行查询计划.请考虑以下文档:

There are several settings which can cause the query planner not to generate a parallel query plan under any circumstances. Consider this documentation:

15.2.何时可以使用并行查询?

在我的阅读中,如果您按以下方式重构函数,则可以实现并行计划:

On my reading, you may be able to achieve a parallel plan if you refactor your function like this:

CREATE OR REPLACE FUNCTION map.get_near_link(
    x NUMERIC,
    y NUMERIC,
    azim NUMERIC)
RETURNS TABLE
(Link_ID INTEGER, Distance INTEGER, Sendito TEXT, Geom GEOGRAPHY)
AS
$$
        SELECT 
               S.Link_ID,
               TRUNC(ST_Distance(ST_GeomFromText('POINT('|| X || ' ' || Y || ')',4326), S.geom) * 100000)::INTEGER AS distance,
               S.sentido,
               v.geom
        FROM (
          SELECT *
          FROM map.vzla_seg
          WHERE ABS(Azim - S.azimuth) NOT BETWEEN 30 AND 330
        ) S
          INNER JOIN map.vzla_rto v
            ON S.link_id = v.link_id
        WHERE
            ST_Distance(ST_GeomFromText('POINT('|| X || ' ' || Y || ')',4326), S.geom) * 100000 < 50
        ORDER BY
            S.geom <-> ST_GeomFromText('POINT('|| X || ' ' || Y || ')', 4326)
        LIMIT 1
$$
LANGUAGE SQL
PARALLEL SAFE -- Include this parameter
;

如果查询优化程序在执行此功能时将生成并行计划,则无需实现自己的并行化逻辑.

If the query optimiser will generate a parallel plan when executing this function, you won't need to implement your own parallelisation logic.

这篇关于我可以将一个查询拆分为多个查询,还是可以创建并行性来加快查询速度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆