一起执行多项功能而不会损失性能 [英] Execute multiple functions together without losing performance

查看:78
本文介绍了一起执行多项功能而不会损失性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个过程,必须使用pl/pgsql进行一系列查询:

I have this process that has to make a series of queries, using pl/pgsql:

--process:
SELECT function1();
SELECT function2();
SELECT function3();
SELECT function4();

为了能够在一个调用中执行所有操作,我创建了一个流程函数,如下所示:

To be able to execute everything in one call, I created a process function as such:

CREATE OR REPLACE FUNCTION process()
  RETURNS text AS
$BODY$
BEGIN
    PERFORM function1();
    PERFORM function2();
    PERFORM function3();
    PERFORM function4();
    RETURN 'process ended';
END;
$BODY$
  LANGUAGE plpgsql

问题是,当我将每个函数单独花费的时间加起来时,总计为200秒,而函数process()花费的时间却超过了一个小时!

The problem is, when I sum the time that each function takes by itself, the total is 200 seconds, while the time that the function process() takes is more than one hour!

也许这是内存问题,但是我不知道应该更改postgresql.conf上的哪个配置.

Maybe it's a memory issue, but I don't know which configuration on postgresql.conf should I change.

该数据库在Debian 8中的PostgreSQL 9.4上运行.

The DB is running on PostgreSQL 9.4, in a Debian 8.

推荐答案

您评论了这4个功能必须连续运行 .因此,可以安全地假定每个函数都可以使用已被前一个函数修改过的表中的数据.那是我的主要嫌疑犯.

You commented that the 4 functions have to run consecutively. So it's safe to assume that each function works with data from tables that have been modified by the previous function. That's my prime suspect.

任何Postgres函数都在外部上下文的事务中运行.因此,如果打包到另一个函数中,则所有函数共享相同的事务上下文.每个人都可以看到以前功能对数据的影响. (即使影响对于其他并发事务仍然不可见.)但是统计信息不会立即更新.

Any Postgres function runs inside the transaction of the outer context. So all functions share the same transaction context if packed into another function. Each can see effects on data from previous functions, obviously. (Even though effects are still invisible to other concurrent transactions.) But statistics are not updated immediately.

查询计划基于统计信息涉及的对象. PL/pgSQL在实际执行语句之前不会计划它们,这会对您有利. 每个文档:

Query plans are based on statistics on involved objects. PL/pgSQL does not plan statements until they are actually executed, that would work in your favor. Per documentation:

在函数中首先执行每个表达式和SQL命令, PL/pgSQL解释器解析并分析命令以创建一个 使用SPI管理器的SPI_prepare函数准备的语句.​​

As each expression and SQL command is first executed in the function, the PL/pgSQL interpreter parses and analyzes the command to create a prepared statement, using the SPI manager's SPI_prepare function.

PL/pgSQL 可以缓存查询计划,但只能在同一会话中的 和(在pg 9.2中) +至少),只有在两次执行都显示相同的查询计划可以最好地重复工作之后.如果您怀疑这对您有问题,可以使用动态SQL解决该问题,该SQL每次都会强制执行新计划:

PL/pgSQL can cache query plans, but only within the same session and (in pg 9.2+ at least) only after a couple of executions have shown the same query plan to work best repeatedly. If you suspect this going wrong for you, you can work around it with dynamic SQL which forces a new plan every time:

EXECUTE 'SELECT function1()';

但是,我看到的最有可能的候选人是无效的统计信息,导致查询计划质量较差.函数内的SELECT/PERFORM语句(相同的内容)快速连续运行,没有

However, the most likely candidate I see is invalidated statistics that lead to inferior query plans. SELECT / PERFORM statements (same thing) inside the function are run in quick succession, there is no chance for autovacuum to kick in and update statistics between one function and the next. If one function substantially alters data in a table the next function is working with, the next function might base its query plan on outdated information. Typical example: A table with a few rows is filled with many thousands of rows, but the next plan still thinks a sequential scan is fastest for the "small" table. You state:

当我总结每个功能单独花费 的时间时,总计为 200秒,而函数process()花费的时间更多 超过一个小时!

when I sum the time that each function takes by itself, the total is 200 seconds, while the time that the function process() takes is more than one hour!

本身"到底是什么意思?您是在单个事务中还是在单个事务中运行它们?也许甚至之间有一段时间?这将允许自动清理更新统计信息(通常相当快),并可能基于更改后的统计信息导致完全不同的查询计划.

What exactly does "by itself" mean? Did you run them in a single transaction or in individual transactions? Maybe even with some time in between? That would allow autovacuum to update statistics (it's typically rather quick) and possibly lead to completely different query plans based on the changed statistic.

您可以使用

  • 用pgpsql编写的UDF调用的Postgres查询计划
  • 如果可以确定此类问题,则可以在语句之间强制使用ANALYZE.就此而言,对于仅几个SELECT/PERFORM语句,您也可以使用更简单的 SQL函数,并避免完全计划缓存(但请参见下文!):

    If you can identify such an issue, you can force ANALYZE in between statements. Being at it, for just a couple of SELECT / PERFORM statements you might as well use a simpler SQL function and avoid plan caching altogether (but see below!):

    CREATE OR REPLACE FUNCTION process()
      RETURNS text AS
    $func$
       SELECT function1();
    
       ANALYZE some_substantially_affected_table;
    
       SELECT function2();
       SELECT function3();
    
       ANALYZE some_other_table;
    
       SELECT function4();
       SELECT 'process ended';  -- only last result is returned
    $func$  LANGUAGE sql;
    

    此外,只要我们看不到您调用的函数的实际代码,就会出现任意数量的其他隐藏效果.
    示例:您可以SET LOCAL ...一些配置参数来提高function1()的性能.如果在单独的事务中调用,则不会影响其余事务.效果仅持续到交易结束.但是,如果在单个事务中调用它,也会影响其余事务...

    Also, as long as we don't see the actual code of your called functions, there can be any number of other hidden effects.
    Example: you could SET LOCAL ... some configuration parameter to improve the performance of your function1(). If called in separate transactions that won't affect the rest. The effect only last till the end of the transaction. But if called in a single transaction it affects the rest, too ...

    基础:

    • Difference between language sql and language plpgsql in PostgreSQL functions
    • PostgreSQL Stored Procedure Performance

    加号::事务累积锁,这将束缚越来越多的资源,并可能导致与并发进程的摩擦增加.在事务结束时将释放所有锁.最好在可能的情况下在 单独的事务中运行大型函数 ,而不是将其包装在单个函数中(从而进行事务处理).最后一项与 @klin

    Plus: transaction accumulate locks, which binds an increasing amount of resources and may cause increasing friction with concurrent processes. All locks are released at the end of the transaction. It's better to run big functions in separate transactions if at all possible, not wrapped in a single function (and thus transaction). That last item is related to what @klin and IMSoP already covered.

    这篇关于一起执行多项功能而不会损失性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆