如何将一组行从一个函数传递到另一个函数? [英] How to pass a set of rows from one function into another?

查看:109
本文介绍了如何将一组行从一个函数传递到另一个函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

概述



我正在使用PostgreSQL 9.1.14,并且试图将一个函数的结果传递给另一个函数。总体思路(具体说明如下),我们可以这样写:

  select * from(选择* from foo ...)

我们可以将子选择抽象为一个函数并从中选择:

 创建函数foos()
返回setof foo
语言sql为$$
选择* from foo ...
$$;

select * from foos()

是否有某种方法可以抽象一个更进一步,以便能够执行以下操作(我知道函数实际上不能具有带有setof类型的参数):

 创建函数more_foos(some_foos setof foo)
语言sql为$$
select * from some_foos ...-或unnest(some_foos)或???
$$:

select * from more_foos(foos())



最小示例和尝试的变通办法



我正在使用PostgreSQL 9.1.14。这是一个最小的示例:

 -1.创建一个具有三行的表x 
放置表(如果存在x级联) ;
如果不存在则创建表x(id int,名称文本);
插入x值(1,'a'),(2,'b'),(3,'c');

-2. xs()是类型为`setof x`的函数
创建或替换函数xs()
返回setof x
语言sql为$ $
select * from x
$$;

-3. xxs()也应该返回x的上下文
-理想情况下,参数应为`setof x`,
-但这是不允许的(见下文)。
创建或替换函数xxs(x [])
返回setof x
语言sql作为$$
select x。* from x
join unnest($ 1)y x.id上的
= y.id
$$;

加载此代码时,我得到了表定义的预期输出,我可以调用然后按照我的期望从 xs()中进行选择。但是,当我尝试将 xs()的结果传递给 xxs()的结果时,出现函数xxs(x)不存在:

  db => testi test.sql 
删除表
创建表
插入0 3
创建功能
创建功能

db =>从xs()选择*;
1 | a
2 | b
3 | c

db =>从xxs(xs())中选择*;
错误:函数xxs(x)不存在
第1行:从xxs(xs())中选择*;
^
提示:没有函数与给定的名称和参数类型匹配。您可能需要添加显式类型转换。

我有点困惑函数xxs(x)不存在;由于 xs()的返回类型为 setof x ,因此我希望其返回类型为 setof x (或者也许是 x [] ),而不是 x 。遵循有关类型的抱怨之后,我可以进入以下任一个,但是尽管有了这两个定义,我都可以 select xxs(xs()); ,但是我不能 select * from xxs(xs());

 创建或替换函数xxs(x)
返回setof x
语言sql作为$$
select x。* from x
join unnest(array [$ 1]) y-unnest(array [...])在x.id = y.id
$$上看起来很糟糕
;



 创建或替换函数xxs (x)
返回setof x
语言sql为$$
select * from x
其中x.id in($ 1.id)
$$;



  db =>选择xxs(xs()); 
(1,a)
(2,b)
(3,c)

db =>从xxs(xs())中选择*;
错误:在上下文中调用的集值函数无法接受集合



摘要



将集合返回函数的结果传递给另一个函数的正确方法是什么?
(我注意到创建函数…xxs(setof x)…会导致错误: ERROR:函数无法接受设置的参数,因此答案不会t实际上就是将一组行从一个函数传递给另一个函数。)

解决方案

表函数



我使用SQL作为客户端和服务器语言(不使用其他语言),并且都在服务器端运行,因此执行高速,复杂的数据库迁移来维持生计,而代码很少从数据库中浮出水面发动机。 表功能在我的工作中发挥了巨大作用。我不使用游标,因为它们太慢了,无法满足我的性能要求,并且我所做的一切都是基于结果集的。表函数对我完全消除了对游标的使用提供了极大的帮助,实现了非常高的速度,并且对减少代码量和提高简单性做出了巨大贡献。



简而言之,您使用查询引用两个(或多个)表函数将数据从一个表函数传递到下一个表函数。 调用表函数的选择查询结果集充当将数据从一个表函数传递到下一个表的渠道。在我正在使用的DB2平台/版本上,它基于快速出现看看9.1 Postgres手册,那里同样如此,您已经发现,只能将单列值作为输入传递给任何表函数调用。 但是,由于表函数调用发生在查询结果集处理的中间,因此,即使在数据库引擎管道中,也可以通过将整个结果集传递给每个表函数调用来达到相同的效果每个表函数一次仅一行。



表函数接受一行输入列,并将单个结果集返回给调用查询(即选择)调用该函数。 从表函数传回的结果集列成为调用查询的结果集的一部分,因此可用作下一个表函数的输入,该列函数随后在同一查询中被引用,通常作为后续联接。第一个表函数的结果列作为输入(一次一行)输入到第二个表函数,第二个表函数将其结果集列返回到调用查询的结果集中。现在,第一和第二个表函数结果集列都属于调用查询的结果集,并且现在可用作第三个表函数的输入(一次一行)。 每个表函数调用都会通过返回的列扩大调用查询的结果集。这种操作可以一直持续到您开始达到结果集宽度的限制为止,这可能会因一个数据库引擎而有所不同。



请考虑以下示例(在我处理DB2时,它可能不符合Postgres的语法要求或功能)。这是我使用表函数的众多设计模式之一,是我认为是非常说明性的一种较简单的模式,并且我预计如果表函数很繁重的话,这种设计模式将具有广泛的吸引力主流使用(据我所知,它们不是,但我认为它们应该得到越来越多的关注。)



在此示例中,使用的表函数为:VALIDATE_TODAYS_ORDER_BATCH ,POST_TODAYS_ORDER_BATCH和DATA_WAREHOUSE_TODAYS_ORDER_BATCH。在我正在使用的DB2版本上,您将表函数包装在 TABLE(在此处放置表函数调用和参数)中,但是基于对Postgres手册的快速浏览,您似乎省略了 TABLE()包装器。 p>

 创建表TODAYS_ORDER_PROCESSING_EXCEPTIONS为(

选择TODAYS_ORDER_BATCH。*
,VALIDATION_RESULT.ROW_VALID
,来自TODAYS_ORDER_BATCH

的POST_RESULT.ROW_POSTED
,WAREHOUSE_RESULT.ROW_WAREHOUSED

交叉加入VALIDATE_TODAYS_ORDER_BATCH(ORDER_NUMBER,[要么传递订单列的其余部分,要么将其取回函数])
as VALIDATION_RESULT(ROW_VALID)-示例:1/0 true / false布尔值返回

左联接POST_TODAYS_ORDER_BATCH(ORDER_NUMBER,[要么传递订单列的其余部分,要么获取它们在函数中))
as POST_RESULT(ROW_POSTED)-示例:1/0 true / false布尔值返回
在ROW_VALIDATED ='1'上

左连接DATA_WAREHOUSE_TODAYS_ORDER_BATCH(ORDER_NUMBER,[通过其余的订单列或在函数中获取它们])
作为WAREHOUSE_RESULT(ROW_WAREHOUSED)--example :1/0 true / false布尔值在ROW_POSTED ='1'时返回


其中coalesce(ROW_VALID,'0')='0'-仅捕获异常和未处理的工作。
或coalcece(ROW_POSTED,'0')='0'-或者,您可以翻转逻辑以仅捕获成功的行。
或coalesce(ROW_WAREHOUSED,'0')='0'

)有数据




  1. 如果表TODAYS_ORDER_BATCH包含1,000,000行,则
    VALIDATE_TODAYS_ORDER_BATCH将被调用1,000,000次,每行
    一次。

  2. 如果900,000行在VALIDATE_TODAYS_ORDER_BATCH内通过验证,则POST_TODAYS_ORDER_BATCH将被调用900,000次。

  3. 如果仅成功发布850,000行,则VALIDATE_TODAYS_ORDER_BATCH需要关闭一些漏洞,大声笑,并且DATA_WAREHOUSE_TODAYS_ORDER_BATCH将被调用850,000次。

  4. 如果成功完成850,000行将其放入数据仓库(即未生成其他异常),然后将在表TODAYS_ORDER_PROCESSING_EXCEPTIONS中填充1,000,000-850,000 = 150,000个异常行。



<在此示例中,表函数调用仅返回单个列,但是它们可能返回许多列。例如,验证订单行的表函数可以返回订单验证失败的原因。



在这种设计中,HLL和数据库之间几乎所有的聊天都是由于HLL请求者要求数据库在ONE请求中处理整个批次,因此消除了这种情况。这导致减少了对数据库的数百万个SQL请求,极大地消除了数百万个HLL过程或方法调用,并因此大大改善了运行时。相比之下,通常一次处理一行的遗留代码通常会发送1,000,000个提取SQL请求,TODAYS_ORDER_BATCH中的每一行发送1个SQL请求,再加上至少1,000,000个HLL和/或用于验证目的的SQL请求,再加上至少1,000,000个HLL和/或用于发布目的的SQL请求,以及用于将订单发送到数据仓库的1,000,000 HLL和/或SQL请求。使用这种表函数设计,虽然可以将表请求内部的SQL请求发送到数据库,但是当数据库向自身发出请求(即从表函数内部)时,SQL请求的处理速度要快得多(尤其是与在旧的场景中,HLL请求者正在从远程系统执行单行处理,最糟糕的情况是通过WAN-OMG,请不要这样做)。



如果使用表函数获取结果集然后将该结果集连接到其他表,则很容易遇到性能问题。在这种情况下,SQL优化器无法预测表函数将返回哪些行,因此无法优化与后续表的联接。因此,除非我知道结果集将只包含很少的行数,因此不会引起性能问题,或者我不需要加入后续表,否则我很少将它们用于获取结果集。 p>

在我看来,表函数未得到充分利用的一个原因是,它们通常仅被视为获取结果集的工具,而结果集的性能通常很差,因此被注销。作为可怜的工具来使用。



表函数对于将更多功能推向服务器,消除数据库服务器与远程系统上的程序之间的大部分闲聊,甚至消除闲聊非常有用。在数据库服务器和同一服务器上的外部程序之间。即使是同一台服务器上的程序之间的聊天也会带来比许多人所意识到的更多的开销,并且其中许多不必要。表函数功能的核心在于使用它们在结果集处理过程中执行操作。



在上述基础上使用表函数有更高级的设计模式模式,您可以在其中最大程度地提高结果集的处理效率,但是大多数人已经开始吸收这篇文章。


Overview

I'm using PostgreSQL 9.1.14, and I'm trying to pass the results of a function into another function. The general idea (specifics, with a minimal example, follow) is that we can write:

select * from (select * from foo ...) 

and we can abstract the sub-select away in a function and select from it:

create function foos() 
returns setof foo
language sql as $$
  select * from foo ...
$$;

select * from foos()

Is there some way to abstract one level farther, so as to be able to do something like this (I know functions cannot actually have arguments with setof types):

create function more_foos( some_foos setof foo )
language sql as $$
  select * from some_foos ...  -- or unnest(some_foos), or ???
$$:

select * from more_foos(foos())

Minimal Example and Attempted Workarounds

I'm using PostgreSQL 9.1.14. Here's a minimal example:

-- 1. create a table x with three rows                                                                                                                                                            
drop table if exists x cascade;
create table if not exists x (id int, name text);
insert into x values (1,'a'), (2,'b'), (3,'c');

-- 2. xs() is a function with type `setof x`
create or replace function xs()
returns setof x
language sql as $$
  select * from x
$$;

-- 3. xxs() should return the context of x, too
--    Ideally the argument would be a `setof x`,
--    but that's not allowed (see below).
create or replace function xxs(x[])  
returns setof x
language sql as $$
  select x.* from x
  join unnest($1) y
       on x.id = y.id
$$;

When I load up this code, I get the expected output for the table definitions, and I can call and select from xs() as I'd expect. But when I try to pass the result of xs() to xxs(), I get an error that "function xxs(x) does not exist":

db=> \i test.sql 
DROP TABLE
CREATE TABLE
INSERT 0 3
CREATE FUNCTION
CREATE FUNCTION

db=> select * from xs();
  1 | a
  2 | b
  3 | c

db=> select * from xxs(xs());
ERROR:  function xxs(x) does not exist
LINE 1: select * from xxs(xs());
                      ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.

I'm a bit confused about "function xxs(x) does not exist"; since the return type of xs() was setof x, I'd expected that its return type would be setof x (or maybe x[]), not x. Following the complaints about the type, I can get to either of the following , but while with either definition I can select xxs(xs());, I can't select * from xxs(xs());.

create or replace function xxs( x )
returns setof x
language sql as $$
  select x.* from x
  join unnest(array[$1]) y    -- unnest(array[...]) seems pretty bad
       on x.id = y.id
$$;

create or replace function xxs( x )
returns setof x
language sql as $$
  select * from x
         where x.id in ($1.id)
$$;

db=> select xxs(xs());
 (1,a)
 (2,b)
 (3,c)

db=> select * from xxs(xs());
ERROR:  set-valued function called in context that cannot accept a set

Summary

What's the right way to pass the results of a set-returning function into another function? (I have noted that create function … xxs( setof x ) … results in the error: ERROR: functions cannot accept set arguments, so the answer won't literally be passing a set of rows from one function to another.)

解决方案

Table functions

I perform very high speed, complex database migrations for a living, using SQL as both the client and server language (no other language is used), all running server side, where the code rarely surfaces from the database engine. Table functions play a HUGE role in my work. I don't use "cursors" since they are too slow to meet my performance requirements, and everything I do is result set oriented. Table functions have been an immense help to me in completely eliminating use of cursors, achieving very high speed, and have contributed dramatically towards reducing code volume and improving simplicity.

In short, you use a query that references two (or more) table functions to pass the data from one table function to the next. The select query result set that calls the table functions serves as the conduit to pass the data from one table function to the next. On the DB2 platform / version I work on, and it appears based on a quick look at the 9.1 Postgres manual that the same is true there, you can only pass a single row of column values as input to any of the table function calls, as you've discovered. However, because the table function call happens in the middle of a query's result set processing, you achieve the same effect of passing a whole result set to each table function call, albeit, in the database engine plumbing, the data is passed only one row at a time to each table function.

Table functions accept one row of input columns, and return a single result set back into the calling query (i.e. select) that called the function. The result set columns passed back from a table function become part of the calling query's result set, and are therefore available as input to the next table function, referenced later in the same query, typically as a subsequent join. The first table function's result columns are fed as input (one row at a time) to the second table function, which returns its result set columns into the calling query's result set. Both the first and second table function result set columns are now part of the calling query's result set, and are now available as input (one row at a time) to a third table function. Each table function call widens the calling query's result set via the columns it returns. This can go on an on until you start hitting limits on the width of a result set, which likely varies from one database engine to the next.

Consider this example (which may not match Postgres' syntax requirements or capabilities as I work on DB2). This is one of many design patterns in which I use table functions, is one of the simpler ones, that I think is very illustrative, and one that I anticipate would have broad appeal if table functions were in heavy mainstream use (to my knowledge they are not, but I think they deserve more attention than they are getting).

In this example, the table functions in use are: VALIDATE_TODAYS_ORDER_BATCH, POST_TODAYS_ORDER_BATCH, and DATA_WAREHOUSE_TODAYS_ORDER_BATCH. On the DB2 version I work on, you wrap the table function inside "TABLE( place table function call and parameters here )", but based on quick look at a Postgres manual it appears you omit the "TABLE( )" wrapper.

create table TODAYS_ORDER_PROCESSING_EXCEPTIONS as (

select      TODAYS_ORDER_BATCH.*
           ,VALIDATION_RESULT.ROW_VALID
           ,POST_RESULT.ROW_POSTED
           ,WAREHOUSE_RESULT.ROW_WAREHOUSED

from        TODAYS_ORDER_BATCH

cross join  VALIDATE_TODAYS_ORDER_BATCH ( ORDER_NUMBER, [either pass the remainder of the order columns or fetch them in the function]  ) 
              as VALIDATION_RESULT ( ROW_VALID )  --example: 1/0 true/false Boolean returned

left join   POST_TODAYS_ORDER_BATCH ( ORDER_NUMBER, [either pass the remainder of the order columns or fetch them in the function] )
              as POST_RESULT ( ROW_POSTED )  --example: 1/0 true/false Boolean returned
      on    ROW_VALIDATED = '1'

left join   DATA_WAREHOUSE_TODAYS_ORDER_BATCH ( ORDER_NUMBER, [either pass the remainder of the order columns or fetch them in the function] )
              as WAREHOUSE_RESULT ( ROW_WAREHOUSED )  --example: 1/0 true/false Boolean returned
      on    ROW_POSTED = '1'

where       coalesce( ROW_VALID,      '0' ) = '0'   --Capture only exceptions and unprocessed work.  
      or    coalesce( ROW_POSTED,     '0' ) = '0'   --Or, you can flip the logic to capture only successful rows.
      or    coalesce( ROW_WAREHOUSED, '0' ) = '0'

) with data

  1. If table TODAYS_ORDER_BATCH contains 1,000,000 rows, then VALIDATE_TODAYS_ORDER_BATCH will be called 1,000,000 times, once for each row.
  2. If 900,000 rows pass validation inside VALIDATE_TODAYS_ORDER_BATCH, then POST_TODAYS_ORDER_BATCH will be called 900,000 times.
  3. If only 850,000 rows successfully post, then VALIDATE_TODAYS_ORDER_BATCH needs some loopholes closed LOL, and DATA_WAREHOUSE_TODAYS_ORDER_BATCH will be called 850,000 times.
  4. If 850,000 rows successfully made it into the Data Warehouse (i.e. no additional exceptions were generated), then table TODAYS_ORDER_PROCESSING_EXCEPTIONS will be populated with 1,000,000 - 850,000 = 150,000 exception rows.

The table function calls in this example are only returning a single column, but they could be returning many columns. For example, the table function validating an order row could return the reason why an order failed validation.

In this design, virtually all the chatter between a HLL and the database is eliminated, since the HLL requestor is asking the database to process the whole batch in ONE request. This results in a reduction of millions of SQL requests to the database, in a HUGE removal of millions of HLL procedure or method calls, and as a result provides a HUGE runtime improvement. In contrast, legacy code which often processes a single row at a time, would typically send 1,000,000 fetch SQL requests, 1 for each row in TODAYS_ORDER_BATCH, plus at least 1,000,000 HLL and/or SQL requests for validation purposes, plus at least 1,000,000 HLL and/or SQL requests for posting purposes, plus 1,000,000 HLL and/or SQL requests for sending the order to the data warehouse. Granted, using this table function design, inside the table functions SQL requests are being sent to the database, but when the database makes requests to itself (i.e from inside a table function), the SQL requests are serviced much faster (especially in comparison to a legacy scenario where the HLL requestor is doing single row processing from a remote system, with the worst case over a WAN - OMG please don't do that).

You can easily run into performance problems if you use a table function to "fetch a result set" and then join that result set to other tables. In that case, the SQL optimizer can't predict what set of rows will be returned from the table function, and therefore it can't optimize the join to subsequent tables. For that reason, I rarely use them for fetching a result set, unless I know that result set will be a very small number of rows, hence not causing a performance problem, or I don't need to join to subsequent tables.

In my opinion, one reason why table functions are underutilized is that they are often perceived as only a tool to fetch a result set, which often performs poorly, so they get written off as a "poor" tool to use.

Table functions are immensely useful for pushing more functionality over to the server, for eliminating most of the chatter between the database server and programs on remote systems, and even for eliminating chatter between the database server and external programs on the same server. Even chatter between programs on the same server carries more overhead than many people realize, and much of it is unnecessary. The heart of the power of table functions lies in using them to perform actions inside result set processing.

There are more advanced design patterns for using table functions that build on the above pattern, where you can maximize result set processing even further, but this post is a lot for most to absorb already.

这篇关于如何将一组行从一个函数传递到另一个函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆