与没有函数包装器的查询相比，SQL函数非常慢 [英] SQL function very slow compared to query without function wrapper

查看：101 发布时间：2018/4/17 10:28:52 postgresql function postgresql-performance sql-execution-plan

本文介绍了与没有函数包装器的查询相比，SQL函数非常慢的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有这个运行速度非常快（〜12ms）的PostgreSQL 9.4查询：

  SELECT 
 auth_web_events.id ，
 auth_web_events.time_stamp，
 auth_web_events.description，
 auth_web_events.origin，
 auth_user.email，
 customers.name，
 auth_web_events.client_ip 
 FROM 
 public.auth_web_events，
 public.auth_user，
 public.customers 
 WHERE 
 auth_web_events.user_id_fk = auth_user.id AND 
 auth_user.customer_id_fk = customers.id AND 
 auth_web_events.user_id_fk = 2 
 ORDER BY 
 auth_web_events.id DESC;

但是，如果我将它嵌入到函数中，查询在所有数据中运行速度非常缓慢，通过每条记录运行，我缺少什么？，我有大约1M的数据，我想简化我的数据库层，将大型查询存储到函数和视图中。

<$ p $ （
id int，
time_stamp带时区的时间戳，
描述文本，
原始文本，
描述文本，
描述文本，
原始文本，
userlogin text，
customer text，
client_ip inet
）AS
$ func $
SELECT
auth_web_events.id，
auth_web_events.time_stamp，
auth_web_events.description，
auth_web_events.origin，
auth_user.email作为用户，
customers.name AS客户，
auth_web_events.client_ip
FROM
public.auth_web_events，
public.auth_user，
public.customers
WHERE
auth_web_events.user_id_fk = auth_user.id AND
auth_user.customer_id_fk = customers.id AND
auth_web_events.user_id_fk = $ 1
ORDER BY
auth_web_events.id DESC;
$ func $ LANGUAGE SQL;

查询计划为：

<$ p $ code>Sort（cost = 20.94..20.94 rows = 1 width = 791）（actual time = 61.905..61.906 rows = 2 loops = 1）
Sort Key：auth_web_events。 ID
排序方法：quicksort内存：25kB
- >嵌套循环（cost = 0.85..20.93 rows = 1 width = 791）（实际时间= 61.884..61.893 rows = 2循环= 1）
- >嵌套循环（cost = 0.71..12.75 rows = 1 width = 577）（实际时间= 61.874..61.879 rows = 2 loops = 1）
- >在auth_web_events（cost = 0.57..4.58 rows = 1 width = 61）（实际时间= 61.860..61.860 rows = 2 loops = 1） 2）
- >在auth_user上使用auth_user_pkey进行索引扫描（cost = 0.14..8.16 rows = 1 width = 524）（实际时间= 0.005..0.005 rows = 1 loops = 2）
Index Cond：（id = 2）
- >使用客户的customers_id_idx进行索引扫描（成本= 0.14..8.16行= 1宽度= 222）（实际时间= 0.004..0.005行= 1个循环= 2）
索引条件：（id = auth_user.customer_id_fk）
计划时间：0.369 ms
执行时间： 61.965 ms

我这样调用funcion：

SELECT * from get_web_events_by_userid（2）

<
函数扫描get_web_events_by_userid（cost = 0.25..10.25 rows = 1000 width = 172）（实际时间= 279107.142..279107.144行= 2个循环= 1）计划时间：0.038 ms 执行时间：279107.175 ms
编辑：我只是更改参数，并且问题仍然存在。

EDIT2：Erwin答案的查询计划： $>

Sort（cost = 20.94..20.94 rows = 1 width = 791）（actual time = 0.048..0.049 rows = 2 loops = 1）排序键：w.id 排序方式：quicksort内存：25kB - >嵌套循环（成本= 0.85..20.93行= 1宽度= 791）（实际时间= 0.030..0.037行= 2个循环= 1） - >嵌套循环（成本= 0.71..12.75行= 1宽度= 577）（实际时间= 0.023..0.025行= 2个循环= 1） - >使用auth_user_pkey在auth_user u上进行索引扫描（cost = 0.14..8.16 rows = 1 width = 524）（actual time = 0.011..0,012 rows = 1 loops = 1） Index Cond：（id = 2） - >索引条件：（user_id_fk = 2）在auth_web_events上使用auth_web_events_fk1进行索引扫描w（cost = 0.57..4.58 rows = 1 width = 61）（actual time = 0.008..0.008 rows = 2 loops = 1） - >索引扫描使用customers_id_idx对客户c（成本= 0.14..8.16行= 1宽度= 222）（实际时间= 0.003..0.004行= 1个循环= 2）索引条件：（id = u。 customer_id_fk）计划时间：0.541 ms 执行时间：0.101 ms

解决方案

~~user~~

在重写你的函数时，我意识到你在这里添加了列别名：

pre $ SELECT 。 .. auth_user.email AS用户， customers.name AS客户，

$ b $因为这些别名在函数外面是不可见的，并且在函数内部没有被引用，所以它们将被忽略。为了文档的目的，更好地使用a

但它也会使您的查询无效，因为 user 完全是 保留字 ，除非双引号，否则不能用作列别名。

奇怪的是，在我的测试中，该函数似乎与无效的别名一起工作。可能因为它被忽略（？）。但是我不确定这是否会产生副作用。

您的函数被重写（否则等效）：

<$
RETURNS TABLE（
id int
，time_stamp timestamptz
，description text
，原始文本
，userlogin文本
，客户文本
，client_ip inet
）AS
$ func $
SELECT w.id
，w .time_stamp
，w.description
，w.origin
，u.email - AS用户 - 请发表评论！
，c.name - AS客户
，w.client_ip
FROM public.auth_user u
JOIN public.auth_web_events w ON w.user_id_fk = u.id
JOIN public.customers c ON c.id = u.customer_id_fk
WHERE u.id = $ 1 - 在这里恢复逻辑
ORDER BY w.id DESC
$ func $ LANGUAGE sql STABLE;

显然， STABLE 关键字改变了结果。 函数波动性 不应该是你描述的测试情况中的一个问题。该设置通常不会使单个孤立的函数调用获利。请阅读手册中的详细信息。此外，标准 EXPLAIN 不会显示关于内部函数内容的查询计划。您可以为此添加额外的 auto-explain 模块：

使用pgpsql编写的UDF调用的查询计划

$ b
但是，正如我们在后续问题中所做的那样，将函数波动率改为 STABLE 允许在外部语句中内联简单的 SELECT ，从而有效地消除图片中的函数。这解释了为什么我们在添加 STABLE 之后看到查询计划，并且还解释了为什么查询计划不同。

您有一个非常奇怪的数据分布：
$ b

auth_web_events表有100000000条记录，auth_user- > 2条记录，客户 - > 1条记录

由于您没有另外定义，函数假定估计值为 1000行返回。但是你的函数实际上只返回 2行。如果你所有的呼叫只返回（在2行附近），只需添加 ROWS 2 即可声明。也可以更改 VOLATILE 变体的查询计划（即使 STABLE 无论如何都是正确的选择） / p>
I have this PostgreSQL 9.4 query that runs very fast (~12ms):
SELECT auth_web_events.id, auth_web_events.time_stamp, auth_web_events.description, auth_web_events.origin, auth_user.email, customers.name, auth_web_events.client_ip FROM public.auth_web_events, public.auth_user, public.customers WHERE auth_web_events.user_id_fk = auth_user.id AND auth_user.customer_id_fk = customers.id AND auth_web_events.user_id_fk = 2 ORDER BY auth_web_events.id DESC;
But if I embed it into a function, the query runs very slow through all data, seems that is running through every record, what am I missing?, I have ~1M of data and I want to simplify my database layer storing the large queries into functions and views.
CREATE OR REPLACE FUNCTION get_web_events_by_userid(int) RETURNS TABLE( id int, time_stamp timestamp with time zone, description text, origin text, userlogin text, customer text, client_ip inet ) AS $func$ SELECT auth_web_events.id, auth_web_events.time_stamp, auth_web_events.description, auth_web_events.origin, auth_user.email AS user, customers.name AS customer, auth_web_events.client_ip FROM public.auth_web_events, public.auth_user, public.customers WHERE auth_web_events.user_id_fk = auth_user.id AND auth_user.customer_id_fk = customers.id AND auth_web_events.user_id_fk = $1 ORDER BY auth_web_events.id DESC; $func$ LANGUAGE SQL;
The query plan is:
"Sort (cost=20.94..20.94 rows=1 width=791) (actual time=61.905..61.906 rows=2 loops=1)" " Sort Key: auth_web_events.id" " Sort Method: quicksort Memory: 25kB" " -> Nested Loop (cost=0.85..20.93 rows=1 width=791) (actual time=61.884..61.893 rows=2 loops=1)" " -> Nested Loop (cost=0.71..12.75 rows=1 width=577) (actual time=61.874..61.879 rows=2 loops=1)" " -> Index Scan using auth_web_events_fk1 on auth_web_events (cost=0.57..4.58 rows=1 width=61) (actual time=61.860..61.860 rows=2 loops=1)" " Index Cond: (user_id_fk = 2)" " -> Index Scan using auth_user_pkey on auth_user (cost=0.14..8.16 rows=1 width=524) (actual time=0.005..0.005 rows=1 loops=2)" " Index Cond: (id = 2)" " -> Index Scan using customers_id_idx on customers (cost=0.14..8.16 rows=1 width=222) (actual time=0.004..0.005 rows=1 loops=2)" " Index Cond: (id = auth_user.customer_id_fk)" "Planning time: 0.369 ms" "Execution time: 61.965 ms"
I'm calling the funcion on this way:
SELECT * from get_web_events_by_userid(2)
The query plan for the function:
"Function Scan on get_web_events_by_userid (cost=0.25..10.25 rows=1000 width=172) (actual time=279107.142..279107.144 rows=2 loops=1)" "Planning time: 0.038 ms" "Execution time: 279107.175 ms"
EDIT: I just change the parameters, and the issue persist.
EDIT2: Query plan for the Erwin answer:
"Sort (cost=20.94..20.94 rows=1 width=791) (actual time=0.048..0.049 rows=2 loops=1)" " Sort Key: w.id" " Sort Method: quicksort Memory: 25kB" " -> Nested Loop (cost=0.85..20.93 rows=1 width=791) (actual time=0.030..0.037 rows=2 loops=1)" " -> Nested Loop (cost=0.71..12.75 rows=1 width=577) (actual time=0.023..0.025 rows=2 loops=1)" " -> Index Scan using auth_user_pkey on auth_user u (cost=0.14..8.16 rows=1 width=524) (actual time=0.011..0.012 rows=1 loops=1)" " Index Cond: (id = 2)" " -> Index Scan using auth_web_events_fk1 on auth_web_events w (cost=0.57..4.58 rows=1 width=61) (actual time=0.008..0.008 rows=2 loops=1)" " Index Cond: (user_id_fk = 2)" " -> Index Scan using customers_id_idx on customers c (cost=0.14..8.16 rows=1 width=222) (actual time=0.003..0.004 rows=1 loops=2)" " Index Cond: (id = u.customer_id_fk)" "Planning time: 0.541 ms" "Execution time: 0.101 ms"

解决方案
~~user~~

While rewriting your function I realized that you added column aliases here:
SELECT ... auth_user.email AS user, customers.name AS customer,
.. which wouldn't do anything to begin with, since those aliases are invisible outside the function and not referenced inside the function. So they would be ignored. For documentation purposes better use a comment.
But it also makes your query invalid, because user is a completely reserved word and cannot be used as column alias unless double-quoted.

Oddly, in my tests the function seems to work with the invalid alias. Probably because it is ignored (?). But I am not sure this couldn't have side effects.

Your function rewritten (otherwise equivalent):
CREATE OR REPLACE FUNCTION get_web_events_by_userid(int) RETURNS TABLE( id int , time_stamp timestamptz , description text , origin text , userlogin text , customer text , client_ip inet ) AS $func$ SELECT w.id , w.time_stamp , w.description , w.origin , u.email -- AS user -- make this a comment! , c.name -- AS customer , w.client_ip FROM public.auth_user u JOIN public.auth_web_events w ON w.user_id_fk = u.id JOIN public.customers c ON c.id = u.customer_id_fk WHERE u.id = $1 -- reverted the logic here ORDER BY w.id DESC $func$ LANGUAGE sql STABLE;
Obviously, the STABLE keyword changed the outcome. Function volatility should not be an issue in the test situation you describe. The setting does not normally profit a single, isolated function call. Read details in the manual. Also, standard EXPLAIN does not show query plans for what's going on inside functions. You could employ the additional module auto-explain for that:

Postgres query plan of a UDF invocation written in pgpsql

But as we have worked out in the follow-up question, changing the function volatility to STABLE allows to inline a simple SELECT in the outer statement, thus effectively eliminating the function from the picture. This explains why we see the query plan after adding STABLE and it also explains, why the query plan is different.

Why is the planner coming up with different results for functions with different volatilities?

You have a very odd data distribution:

auth_web_events table has 100000000 records, auth_user->2 records, customers-> 1 record

Since you didn't define otherwise, the function assumes an estimate of 1000 rows to be returned. But your function is actually returning only 2 rows. If all your calls only return (in the vicinity of) 2 rows, just declare that with an added ROWS 2. Might change the query plan for the VOLATILE variant as well (even if STABLE is the right choice anyway here).

这篇关于与没有函数包装器的查询相比，SQL函数非常慢的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

与没有函数包装器的查询相比，SQL函数非常慢 [英] SQL function very slow compared to query without function wrapper

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

与没有函数包装器的查询相比，SQL函数非常慢 [英] SQL function very slow compared to query without function wrapper

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭