将多个值的集合或数组传递给一个函数 [英] Pass multiple sets or arrays of values to a function
问题描述
我正在PostgreSQL 9.3.10中编写一个PL/pgSQL函数,以返回谁参加了下表中的某些班级/会议:
I'm writing a PL/pgSQL function in PostgreSQL 9.3.10 to return who has attended certain classes/sessions from the following table:
Attendance
+-------+---------+---------+
| Class | Section | Name |
+-------+---------+---------+
| 1 | 1 | Amy |
| 1 | 1 | Bill |
| 1 | 2 | Charlie |
| 1 | 2 | Dan |
| 2 | 1 | Emily |
| 2 | 1 | Fred |
| 2 | 2 | George |
+-------+---------+---------+
我想要做的是,给定一组类/部分ID对(int[][]
),返回所有在那些类/部分中的人.例如,my_func(ARRAY[[1,1],[2,2]])
返回:
What I want to do is, given a array of class/section id pairs (int[][]
), return all people who are in those classes/sections. For example my_func(ARRAY[[1,1],[2,2]])
returns:
+-------+---------+---------+
| Class | Section | Name |
+-------+---------+---------+
| 1 | 1 | Amy |
| 1 | 1 | Bill |
| 2 | 2 | George |
+-------+---------+---------+
如果我事先知道这对,那就很简单了:
If I knew the pairs beforehand, it would be a simple:
SELECT * FROM attendance
WHERE ((class = 1 AND section = 1) OR (class = 2 AND section = 2));
相反,对将成为函数的参数.
Instead, the pairs will be a parameter of the function.
现在,我能想到的唯一方法是通过在查询末尾附加一堆WHERE
子句,然后调用EXECUTE
,使该函数本质上构建一个SQL查询字符串.有没有更好的方法来得到我的结果?
Right now, the only way I can think of doing this is to have the function essentially build an SQL query string by appending a bunch of WHERE
clauses to the end of the query and then calling EXECUTE
. Is there a better way to get my result?
我实现了该建议@Erwin的建议,目前能够得到我想要的结果.不幸的是,它似乎很慢.这是我正在运行的功能:
I implemented the suggestion @Erwin's suggestion and am currently able to get the results I want. Unfortunately, it appears as though it is fairly slow. Here is the function I am running:
CREATE OR REPLACE FUNCTION public.get_attendance(int[])
RETURNS TABLE(
class_c int,
section_c int
)
AS
$BODY$
BEGIN
RETURN QUERY
SELECT class, section
FROM generate_subscripts($1, 1) as i
INNER JOIN attendance ON attendance.class = $1[i][1]
AND attendance.section = $1[i][2];
END;
$BODY$
LANGUAGE plpgsql VOLATILE;
这样查询:
SELECT * FROM get_attendance(ARRAY[[1,15],[2,15],[3,8]]);
我得到以下解释分析输出
Merge Join (cost=60.26..50139.72 rows=30840 width=8) (actual time=44.174..142.100 rows=25290 loops=1)
Merge Cond: ((attendance.class = (('{{1,15},{2,15},{3,8}}'::integer[])[i.i][1])) AND (attendance.section = (('{{1,15},{2,15},{3,8}}'::integer[])[i.i][2])))
-> Index Only Scan using class_section_idx on attendance (cost=0.43..43372.25 rows=1233588 width=8) (actual time=0.009..86.625 rows=1145046 loops=1)
Heap Fetches: 0
-> Sort (cost=59.83..62.33 rows=1000 width=4) (actual time=0.010..0.757 rows=10031 loops=1)
Sort Key: (('{{1,15},{2,15},{3,8}}'::integer[])[i.i][1]), (('{{1,15},{2,15},{3,8}}'::integer[])[i.i][2])
Sort Method: quicksort Memory: 25kB
-> Function Scan on generate_subscripts i (cost=0.00..10.00 rows=1000 width=4) (actual time=0.006..0.007 rows=3 loops=1)
问题在于查询正在扫描出席者表中的所有出席者,而不过滤它们直到加入.有什么办法可以解决这个问题?
The problem is that the query is scanning through all attendances in the attendance table without filtering them until the join. Is there any way to fix this?
推荐答案
You can achieve that with a simple SQL function. Key feature is the function generate_subscripts()
:
CREATE OR REPLACE FUNCTION f_attendance(_arr2d int[])
RETURNS SETOF attendance AS
$func$
SELECT a.*
FROM generate_subscripts($1, 1) i
JOIN attendance a ON a.class = $1[i][1]
AND a.section = $1[i][2]
$func$ LANGUAGE ROWS 10 sql STABLE;
致电:
SELECT * FROM f_attendance(ARRAY[[1,1],[2,2]]);
或与数组 literal 相同-在某些情况下,特别是对于准备好的语句,这更方便:
Or the same with an array literal - which is more convenient in some contexts, especially with prepared statements:
SELECT * FROM f_attendance('{{1,1},{2,2}}');
函数总是需要2D数组.即使您传递了一对,也要嵌套它:
The function always expects a 2D array. Even if you pass a single pair, nest it:
SELECT * FROM f_attendance('{{1,1}}');
实施审核
-
您已经创建了功能
VOLATILE
,但是它可以是STABLE
. 每个文档:
You made the function
VOLATILE
, but it can beSTABLE
. Per documentation:
由于这种快照行为,可以安全地将仅包含
SELECT
命令的功能标记为STABLE
.
Because of this snapshotting behavior, a function containing only
SELECT
commands can safely be markedSTABLE
.
相关:
您还可以使用LANGUAGE plpgsql
而不是sql
,如果您在同一会话中多次执行该功能,则很有意义.但是,您还必须将其设置为STABLE
,否则您将失去潜在的性能优势. 手册再次:
You also use LANGUAGE plpgsql
instead of sql
, which makes sense if you execute the function multiple times in the same session. But then you must also make it STABLE
or you lose that potential performance benefit. The manual once more:
STABLE
和IMMUTABLE
函数使用从 调用查询的开始,而VOLATILE函数获得新的 他们执行的每个查询开始时的快照.
STABLE
andIMMUTABLE
functions use a snapshot established as of the start of the calling query, whereas VOLATILE functions obtain a fresh snapshot at the start of each query they execute.
您的EXPLAIN
输出显示仅索引扫描,而不是您在注释中怀疑的顺序扫描.
Your EXPLAIN
output shows an Index Only Scan, not a sequential scan like you suspect in your comment.
EXPLAIN
输出中还有一个排序步骤,与您显示的代码不匹配.您确定复制了正确的EXPLAIN
输出吗?无论如何,您是如何获得它的? PL/pgSQL函数是EXPLAIN
的黑匣子.您使用auto_explain
吗?详细信息:
There is also a sort step in your EXPLAIN
output that does not match the code you show. Are you sure you copied the right EXPLAIN
output? How did you obtain it anyway? PL/pgSQL functions are black boxes to EXPLAIN
. Did you use auto_explain
? Details:
Postgres查询计划程序不知道传递的参数将具有多少个数组元素,因此很难计划查询,并且它可能默认为顺序扫描(取决于更多因素).您可以通过声明预期的行数来提供帮助.如果您通常没有10个以上的项目,请像上面一样添加ROWS 10
.然后再次测试.
The Postgres query planner has no idea how many array elements the passed parameter will have, so it is hard to plan the query and it may default to a sequential scan (depending on more factors). You can help by declaring the expected number of rows. If you typically don't have more than 10 items add ROWS 10
like I did now above. And test again.
这篇关于将多个值的集合或数组传递给一个函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!