与没有函数包装器的查询相比,SQL函数非常慢 [英] SQL function very slow compared to query without function wrapper
问题描述
我有这个运行速度非常快(〜12ms)的PostgreSQL 9.4查询:
SELECT
auth_web_events.id ,
auth_web_events.time_stamp,
auth_web_events.description,
auth_web_events.origin,
auth_user.email,
customers.name,
auth_web_events.client_ip
FROM
public.auth_web_events,
public.auth_user,
public.customers
WHERE
auth_web_events.user_id_fk = auth_user.id AND
auth_user.customer_id_fk = customers.id AND
auth_web_events.user_id_fk = 2
ORDER BY
auth_web_events.id DESC;
但是,如果我将它嵌入到函数中,查询在所有数据中运行速度非常缓慢,通过每条记录运行,我缺少什么?,我有大约1M的数据,我想简化我的数据库层,将大型查询存储到函数和视图中。
<$ p $ (
id int,
time_stamp带时区的时间戳,
描述文本,
原始文本,
描述文本,
描述文本,
原始文本,
userlogin text,
customer text,
client_ip inet
)AS
$ func $
SELECT
auth_web_events.id,
auth_web_events.time_stamp,
auth_web_events.description,
auth_web_events.origin,
auth_user.email作为用户,
customers.name AS客户,
auth_web_events.client_ip
FROM
public.auth_web_events,
public.auth_user,
public.customers
WHERE
auth_web_events.user_id_fk = auth_user.id AND
auth_user.customer_id_fk = customers.id AND
auth_web_events.user_id_fk = $ 1
ORDER BY
auth_web_events.id DESC;
$ func $ LANGUAGE SQL;
查询计划为:
<$ p $ code>Sort(cost = 20.94..20.94 rows = 1 width = 791)(actual time = 61.905..61.906 rows = 2 loops = 1)
Sort Key:auth_web_events。 ID
排序方法:quicksort内存:25kB
- >嵌套循环(cost = 0.85..20.93 rows = 1 width = 791)(实际时间= 61.884..61.893 rows = 2循环= 1)
- >嵌套循环(cost = 0.71..12.75 rows = 1 width = 577)(实际时间= 61.874..61.879 rows = 2 loops = 1)
- >在auth_web_events(cost = 0.57..4.58 rows = 1 width = 61)(实际时间= 61.860..61.860 rows = 2 loops = 1) 2)
- >在auth_user上使用auth_user_pkey进行索引扫描(cost = 0.14..8.16 rows = 1 width = 524)(实际时间= 0.005..0.005 rows = 1 loops = 2)
Index Cond:(id = 2)
- >使用客户的customers_id_idx进行索引扫描(成本= 0.14..8.16行= 1宽度= 222)(实际时间= 0.004..0.005行= 1个循环= 2)
索引条件:(id = auth_user.customer_id_fk)
计划时间:0.369 ms
执行时间: 61.965 ms
我这样调用funcion:
SELECT * from get_web_events_by_userid(2)
<
函数扫描get_web_events_by_userid(cost = 0.25..10.25 rows = 1000 width = 172)(实际时间= 279107.142..279107.144行= 2个循环= 1)
计划时间:0.038 ms
执行时间:279107.175 ms
编辑:我只是更改参数,并且问题仍然存在。
EDIT2:Erwin答案的查询计划: $>
Sort(cost = 20.94..20.94 rows = 1 width = 791)(actual time = 0.048..0.049 rows = 2 loops = 1)
排序键:w.id
排序方式:quicksort内存:25kB
- >嵌套循环(成本= 0.85..20.93行= 1宽度= 791)(实际时间= 0.030..0.037行= 2个循环= 1)
- >嵌套循环(成本= 0.71..12.75行= 1宽度= 577)(实际时间= 0.023..0.025行= 2个循环= 1)
- >使用auth_user_pkey在auth_user u上进行索引扫描(cost = 0.14..8.16 rows = 1 width = 524)(actual time = 0.011..0,012 rows = 1 loops = 1)
Index Cond:(id = 2)
- >索引条件:(user_id_fk = 2)在auth_web_events上使用auth_web_events_fk1进行索引扫描w(cost = 0.57..4.58 rows = 1 width = 61)(actual time = 0.008..0.008 rows = 2 loops = 1)
- >索引扫描使用customers_id_idx对客户c(成本= 0.14..8.16行= 1宽度= 222)(实际时间= 0.003..0.004行= 1个循环= 2)
索引条件:(id = u。 customer_id_fk)
计划时间:0.541 ms
执行时间:0.101 ms
user
在重写你的函数时,我意识到你在这里添加了列别名:
pre $ SELECT
。 ..
auth_user.email AS用户,
customers.name AS客户,
$ b $因为这些别名在函数外面是不可见的,并且在函数内部没有被引用,所以它们将被忽略。为了文档的目的,更好地使用a
但它也会使您的查询无效,因为 user
完全是 保留字 ,除非双引号,否则不能用作列别名。
奇怪的是,在我的测试中,该函数似乎与无效的别名一起工作。可能因为它被忽略(?)。但是我不确定这是否会产生副作用。
您的函数被重写(否则等效):
<$
RETURNS TABLE(
id int
,time_stamp timestamptz
,description text
,原始文本
,userlogin文本
,客户文本
,client_ip inet
)AS
$ func $
SELECT w.id
,w .time_stamp
,w.description
,w.origin
,u.email - AS用户 - 请发表评论!
,c.name - AS客户
,w.client_ip
FROM public.auth_user u
JOIN public.auth_web_events w ON w.user_id_fk = u.id
JOIN public.customers c ON c.id = u.customer_id_fk
WHERE u.id = $ 1 - 在这里恢复逻辑
ORDER BY w.id DESC
$ func $ LANGUAGE sql STABLE;
显然, STABLE
关键字改变了结果。 函数波动性 不应该是你描述的测试情况中的一个问题。该设置通常不会使单个孤立的函数调用获利。请阅读手册中的详细信息。此外,标准 EXPLAIN
不会显示关于 内部 函数内容的查询计划。您可以为此添加额外的 auto-explain 模块:
- 使用pgpsql编写的UDF调用的查询计划
$ b
但是,正如我们在后续问题中所做的那样,将函数波动率改为 STABLE
允许在外部语句中内联简单的 SELECT
,从而有效地消除图片中的函数。这解释了为什么我们在添加 STABLE
之后看到查询计划,并且还解释了为什么查询计划不同。
您有一个非常奇怪的数据分布:
$ b
auth_web_events表有100000000条记录,auth_user- > 2条记录,客户 - > 1条记录
由于您没有另外定义,函数假定估计值为 1000行返回。但是你的函数实际上只返回 2行。如果你所有的呼叫只返回(在2行附近),只需添加 I have this PostgreSQL 9.4 query that runs very fast (~12ms): But if I embed it into a function, the query runs very slow through all data, seems that is running through every record, what am I missing?, I have ~1M of data and I want to simplify my database layer storing the large queries into functions and views. The query plan is: I'm calling the funcion on this way: The query plan for the function: EDIT: I just change the parameters, and the issue persist.
While rewriting your function I realized that you added column aliases here: .. which wouldn't do anything to begin with, since those aliases are invisible outside the function and not referenced inside the function. So they would be ignored. For documentation purposes better use a comment. But it also makes your query invalid, because Oddly, in my tests the function seems to work with the invalid alias. Probably because it is ignored (?). But I am not sure this couldn't have side effects. Your function rewritten (otherwise equivalent): Obviously, the But as we have worked out in the follow-up question, changing the function volatility to You have a very odd data distribution: auth_web_events table has 100000000 records, auth_user->2 records, customers-> 1 record Since you didn't define otherwise, the function assumes an estimate of 1000 rows to be returned. But your function is actually returning only 2 rows. If all your calls only return (in the vicinity of) 2 rows, just declare that with an added 这篇关于与没有函数包装器的查询相比,SQL函数非常慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋! ROWS 2
即可声明。也可以更改 VOLATILE
变体的查询计划(即使SELECT
auth_web_events.id,
auth_web_events.time_stamp,
auth_web_events.description,
auth_web_events.origin,
auth_user.email,
customers.name,
auth_web_events.client_ip
FROM
public.auth_web_events,
public.auth_user,
public.customers
WHERE
auth_web_events.user_id_fk = auth_user.id AND
auth_user.customer_id_fk = customers.id AND
auth_web_events.user_id_fk = 2
ORDER BY
auth_web_events.id DESC;
CREATE OR REPLACE FUNCTION get_web_events_by_userid(int) RETURNS TABLE(
id int,
time_stamp timestamp with time zone,
description text,
origin text,
userlogin text,
customer text,
client_ip inet
) AS
$func$
SELECT
auth_web_events.id,
auth_web_events.time_stamp,
auth_web_events.description,
auth_web_events.origin,
auth_user.email AS user,
customers.name AS customer,
auth_web_events.client_ip
FROM
public.auth_web_events,
public.auth_user,
public.customers
WHERE
auth_web_events.user_id_fk = auth_user.id AND
auth_user.customer_id_fk = customers.id AND
auth_web_events.user_id_fk = $1
ORDER BY
auth_web_events.id DESC;
$func$ LANGUAGE SQL;
"Sort (cost=20.94..20.94 rows=1 width=791) (actual time=61.905..61.906 rows=2 loops=1)"
" Sort Key: auth_web_events.id"
" Sort Method: quicksort Memory: 25kB"
" -> Nested Loop (cost=0.85..20.93 rows=1 width=791) (actual time=61.884..61.893 rows=2 loops=1)"
" -> Nested Loop (cost=0.71..12.75 rows=1 width=577) (actual time=61.874..61.879 rows=2 loops=1)"
" -> Index Scan using auth_web_events_fk1 on auth_web_events (cost=0.57..4.58 rows=1 width=61) (actual time=61.860..61.860 rows=2 loops=1)"
" Index Cond: (user_id_fk = 2)"
" -> Index Scan using auth_user_pkey on auth_user (cost=0.14..8.16 rows=1 width=524) (actual time=0.005..0.005 rows=1 loops=2)"
" Index Cond: (id = 2)"
" -> Index Scan using customers_id_idx on customers (cost=0.14..8.16 rows=1 width=222) (actual time=0.004..0.005 rows=1 loops=2)"
" Index Cond: (id = auth_user.customer_id_fk)"
"Planning time: 0.369 ms"
"Execution time: 61.965 ms"
SELECT * from get_web_events_by_userid(2)
"Function Scan on get_web_events_by_userid (cost=0.25..10.25 rows=1000 width=172) (actual time=279107.142..279107.144 rows=2 loops=1)"
"Planning time: 0.038 ms"
"Execution time: 279107.175 ms"
EDIT2: Query plan for the Erwin answer:"Sort (cost=20.94..20.94 rows=1 width=791) (actual time=0.048..0.049 rows=2 loops=1)"
" Sort Key: w.id"
" Sort Method: quicksort Memory: 25kB"
" -> Nested Loop (cost=0.85..20.93 rows=1 width=791) (actual time=0.030..0.037 rows=2 loops=1)"
" -> Nested Loop (cost=0.71..12.75 rows=1 width=577) (actual time=0.023..0.025 rows=2 loops=1)"
" -> Index Scan using auth_user_pkey on auth_user u (cost=0.14..8.16 rows=1 width=524) (actual time=0.011..0.012 rows=1 loops=1)"
" Index Cond: (id = 2)"
" -> Index Scan using auth_web_events_fk1 on auth_web_events w (cost=0.57..4.58 rows=1 width=61) (actual time=0.008..0.008 rows=2 loops=1)"
" Index Cond: (user_id_fk = 2)"
" -> Index Scan using customers_id_idx on customers c (cost=0.14..8.16 rows=1 width=222) (actual time=0.003..0.004 rows=1 loops=2)"
" Index Cond: (id = u.customer_id_fk)"
"Planning time: 0.541 ms"
"Execution time: 0.101 ms"
user
SELECT
...
auth_user.email AS user,
customers.name AS customer,
user
is a completely reserved word and cannot be used as column alias unless double-quoted.CREATE OR REPLACE FUNCTION get_web_events_by_userid(int)
RETURNS TABLE(
id int
, time_stamp timestamptz
, description text
, origin text
, userlogin text
, customer text
, client_ip inet
) AS
$func$
SELECT w.id
, w.time_stamp
, w.description
, w.origin
, u.email -- AS user -- make this a comment!
, c.name -- AS customer
, w.client_ip
FROM public.auth_user u
JOIN public.auth_web_events w ON w.user_id_fk = u.id
JOIN public.customers c ON c.id = u.customer_id_fk
WHERE u.id = $1 -- reverted the logic here
ORDER BY w.id DESC
$func$ LANGUAGE sql STABLE;
STABLE
keyword changed the outcome. Function volatility should not be an issue in the test situation you describe. The setting does not normally profit a single, isolated function call. Read details in the manual. Also, standard EXPLAIN
does not show query plans for what's going on inside functions. You could employ the additional module auto-explain for that:STABLE
allows to inline a simple SELECT
in the outer statement, thus effectively eliminating the function from the picture. This explains why we see the query plan after adding STABLE
and it also explains, why the query plan is different.
ROWS 2
. Might change the query plan for the VOLATILE
variant as well (even if STABLE
is the right choice anyway here).