重构一个 PL/pgSQL 函数以返回各种 SELECT 查询的输出 [英] Refactor a PL/pgSQL function to return the output of various SELECT queries

查看:59
本文介绍了重构一个 PL/pgSQL 函数以返回各种 SELECT 查询的输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我编写了一个函数,它以文本形式输出格式良好的 PostgreSQL SELECT 查询.现在我不想再输出文本,而是实际对数据库运行生成的 SELECT 语句并返回结果 - 就像查询本身一样.

到目前为止我所拥有的:

CREATE OR REPLACE FUNCTION data_of(integer)返回文本为$BODY$宣布传感器 varchar(100);-- 保存列名列表输入 varchar(100);-- 保存表名结果文本;-- 保存 SQL 查询-- 声明更多变量开始——做一些疯狂的事情结果 := 'SELECT
Datahora,' ||传感器 ||'

从
' ||输入 ||'

WHERE
id=' ||$1 ||'

ORDER BY Datahora;';返回结果;结尾;$BODY$语言'plpgsql'易失性;ALTER FUNCTION data_of(integer) OWNER TO postgres;

sensors 保存表 type 的列名列表.这些是在函数的过程中声明和填充的.最终,它们持有如下值:

  • sensors: 'column1, column2, column3'
    除了 Datahora (timestamp),所有列都是 double precision 类型.

  • type :'myTable'
    可以是四个表之一的名称.每个都有不同的列,除了公共列Datahora.

基础表的定义.

变量sensors 将保存此处显示的type 中相应表的所有 列.例如:如果 typepcdmet 那么 sensors 将是 'datahora,dirvento,precipitacao,pressaoatm,radsolacum,tempar,umidrel,velvento'

变量用于构建存储在 result 中的 SELECT 语句.喜欢:

SELECT Datahora, column1, column2, column3从我的表哪里 id=20由 Datahora 订购;

现在,我的函数将此语句返回为 text.我复制粘贴并在 pgAdmin 或通过 psql 执行它.我想自动化这个,自动运行查询并返回结果.我该怎么做?

解决方案

Dynamic SQL and RETURN type

(我把最好的留到最后,继续阅读!)
您想要执行动态 SQL.原则上,在 EXECUTE.您不需要游标.事实上,大多数情况下,您最好不要使用显式游标.

您遇到的问题:您想要返回未定义类型的记录.函数需要在 RETURNS<中声明其返回类型/code> 子句(或带有 OUTINOUT 参数).在您的情况下,您将不得不退回到匿名记录,因为返回列的 numbernamestypes 各不相同.喜欢:

CREATE FUNCTION data_of(integer)返回 SETOF 记录为 ...

然而,这并不是特别有用.您必须在每次调用时提供一个列定义列表.喜欢:

SELECT * FROM data_of(17)AS foo (colum_name1 整数, colum_name2 文本, colum_name3 实数);

但是,如果您事先不知道这些列,您会怎么做?
您可以使用结构化程度较低的文档数据类型,例如 jsonjsonbhstorexml.见:

但是,对于这个问题,我们假设您希望尽可能多地返回单个、正确键入和命名的列.

固定返回类型的简单解决方案

datahora 似乎是给定的,我假设数据类型为 timestamp,并且总是有另外两列具有不同的名称和数据类型.>

名称我们将放弃返回类型中的通用名称.
类型我们也将放弃,并全部转换为text,因为每个数据类型都可以转换为text.

CREATE OR REPLACE FUNCTION data_of(_id integer)返回表(datahora 时间戳、col2 文本、col3 文本)语言 plpgsql AS$func$宣布_sensors 文本 := 'col1::text, col2::text';-- 将每个 col 转换为文本_type 文本:= 'foo';开始返回查询执行'SELECT datahora, ' ||_sensors ||'从'||quote_ident(_type) ||'WHERE id = $1ORDER BY datahora'使用_id;结尾$func$;

变量_sensors_type 可以作为输入参数.

注意RETURNS TABLE子句.

注意RETURN QUERY EXECUTE的使用.这是从动态查询中返回行的更优雅的方法之一.

我使用函数参数的名称,只是为了使 RETURN QUERY EXECUTEUSING 子句不那么混乱.SQL 字符串中的 $1 不是指函数参数,而是指通过 USING 子句传递的值.(在这个简单的例子中,两者都恰好是 $1 在它们各自的范围内.)

注意 _sensors 的示例值:每一列都被转换为类型 text.

这种代码很容易受到SQL 注入.我使用 quote_ident() 来防止它.将变量 _sensors 中的几个列名集中在一起可以防止使用 quote_ident()(这通常是一个坏主意!).确保没有坏东西可以以其他方式存在,例如通过 quote_ident() 单独运行列名.一个 VARIADIC 参数浮现在脑海......

自 PostgreSQL 9.1 起更简单

在 9.1 或更高版本中,您可以使用 format() 进一步简化:

RETURN QUERY EXECUTE 格式('SELECT datahora, %s -- 作为非转义字符串传递的标识符FROM %I -- 假设名称由用户提供WHERE id = $1ORDER BY datahora',_sensors, _type)使用_id;

同样,可以正确地转义单个列名,这将是一种干净的方式.

共享相同类型的可变列数

在您的问题更新后,您的返回类型似乎具有

  • 变量列数
  • 但所有列的类型 双精度(别名float8)

在这种情况下使用 ARRAY 类型来嵌套可变数量的值.此外,我返回一个列名数组:

CREATE OR REPLACE FUNCTION data_of(_id integer)返回表(datahora 时间戳,名称 text[],值 float8[])语言 plpgsql AS$func$宣布_sensors 文本:= 'col1, col2, col3';-- 列名的简单列表_type 文本:= 'foo';开始返回查询执行格式('选择数据源, string_to_array($1) -- AS 名称, ARRAY[%s] -- AS 值从 %sWHERE id = $2ORDER BY datahora', _sensors, _type)使用_sensors,_id;结尾$func$;


各种完整的表格类型

要实际返回表的所有列,有一个使用 多态类型:

CREATE OR REPLACE FUNCTION data_of(_tbl_type anyelement, _id int)返回 SETOF 任何元素语言 plpgsql AS$func$开始返回查询执行格式('选择 *FROM %s -- pg_typeof 返回 regtype,自动引用WHERE id = $1ORDER BY datahora', pg_typeof(_tbl_type))使用_id;结尾$func$;

致电(重要!):

SELECT * FROM data_of(NULL::pcdmet, 17);

将调用中的 pcdmet 替换为任何其他表名.

这是如何工作的?

anyelement 是一种伪数据类型,一种多态类型,是任何非数组数据类型的占位符.函数中出现的所有 anyelement 评估为运行时提供的相同类型.通过提供一个已定义类型的值作为函数的参数,我们隐式地定义了返回类型.

PostgreSQL 自动为每个创建的表定义行类型(复合数据类型),因此每个表都有一个明确定义的类型.这包括临时表,方便临时使用.

任何类型都可以是NULL.输入一个 NULL 值,转换为表类型:NULL::pcdmet.

现在该函数返回一个明确定义的行类型,我们可以使用 SELECT * FROM data_of() 来分解行并获取单个列.

pg_typeof(_tbl_type) 返回表名 对象标识符类型regtype.当自动转换为 text 时,如果需要,标识符会自动双引号和模式限定,自动防御 SQL 注入.这甚至可以处理 quote_ident() 会失败的模式限定表名.见:

I wrote a function that outputs a PostgreSQL SELECT query well formed in text form. Now I don't want to output a text anymore, but actually run the generated SELECT statement against the database and return the result - just like the query itself would.

What I have so far:

CREATE OR REPLACE FUNCTION data_of(integer)
  RETURNS text AS
$BODY$
DECLARE
   sensors varchar(100);   -- holds list of column names
   type    varchar(100);   -- holds name of table
   result  text;           -- holds SQL query
       -- declare more variables

BEGIN
      -- do some crazy stuff

      result := 'SELECT
Datahora,' || sensors ||
      '

FROM
' || type ||
      '

WHERE
id=' || $1 ||'

ORDER BY Datahora;';

      RETURN result;
END;
$BODY$
LANGUAGE 'plpgsql' VOLATILE;
ALTER FUNCTION data_of(integer) OWNER TO postgres;

sensors holds the list of column names for the table type. Those are declared and filled in the course of the function. Eventually, they hold values like:

  • sensors: 'column1, column2, column3'
    Except for Datahora (timestamp) all columns are of type double precision.

  • type :'myTable'
    Can be the name of one of four tables. Each has different columns, except for the common column Datahora.

Definition of the underlying tables.

The variable sensors will hold all columns displayed here for the corresponding table in type. For example: If type is pcdmet then sensors will be 'datahora,dirvento,precipitacao,pressaoatm,radsolacum,tempar,umidrel,velvento'

The variables are used to build a SELECT statement that is stored in result. Like:

SELECT Datahora, column1, column2, column3
FROM   myTable
WHERE  id=20
ORDER  BY Datahora;

Right now, my function returns this statement as text. I copy-paste and execute it in pgAdmin or via psql. I want to automate this, run the query automatically and return the result. How can I do that?

解决方案

Dynamic SQL and RETURN type

(I saved the best for last, keep reading!)
You want to execute dynamic SQL. In principal, that's simple in plpgsql with the help of EXECUTE. You don't need a cursor. In fact, most of the time you are better off without explicit cursors.

The problem you run into: you want to return records of yet undefined type. A function needs to declare its return type in the RETURNS clause (or with OUT or INOUT parameters). In your case you would have to fall back to anonymous records, because number, names and types of returned columns vary. Like:

CREATE FUNCTION data_of(integer)
  RETURNS SETOF record AS ...

However, this is not particularly useful. You have to provide a column definition list with every call. Like:

SELECT * FROM data_of(17)
AS foo (colum_name1 integer
      , colum_name2 text
      , colum_name3 real);

But how would you even do this, when you don't know the columns beforehand?
You could use less structured document data types like json, jsonb, hstore or xml. See:

But, for the purpose of this question, let's assume you want to return individual, correctly typed and named columns as much as possible.

Simple solution with fixed return type

The column datahora seems to be a given, I'll assume data type timestamp and that there are always two more columns with varying name and data type.

Names we'll abandon in favor of generic names in the return type.
Types we'll abandon, too, and cast all to text since every data type can be cast to text.

CREATE OR REPLACE FUNCTION data_of(_id integer)
  RETURNS TABLE (datahora timestamp, col2 text, col3 text)
  LANGUAGE plpgsql AS
$func$
DECLARE
   _sensors text := 'col1::text, col2::text';  -- cast each col to text
   _type    text := 'foo';
BEGIN
   RETURN QUERY EXECUTE '
      SELECT datahora, ' || _sensors || '
      FROM   ' || quote_ident(_type) || '
      WHERE  id = $1
      ORDER  BY datahora'
   USING  _id;
END
$func$;

The variables _sensors and _type could be input parameters instead.

Note the RETURNS TABLE clause.

Note the use of RETURN QUERY EXECUTE. That is one of the more elegant ways to return rows from a dynamic query.

I use a name for the function parameter, just to make the USING clause of RETURN QUERY EXECUTE less confusing. $1 in the SQL-string does not refer to the function parameter but to the value passed with the USING clause. (Both happen to be $1 in their respective scope in this simple example.)

Note the example value for _sensors: each column is cast to type text.

This kind of code is very vulnerable to SQL injection. I use quote_ident() to protect against it. Lumping together a couple of column names in the variable _sensors prevents the use of quote_ident() (and is typically a bad idea!). Ensure that no bad stuff can be in there some other way, for instance by individually running the column names through quote_ident() instead. A VARIADIC parameter comes to mind ...

Simpler since PostgreSQL 9.1

With version 9.1 or later you can use format() to further simplify:

RETURN QUERY EXECUTE format('
   SELECT datahora, %s  -- identifier passed as unescaped string
   FROM   %I            -- assuming the name is provided by user
   WHERE  id = $1
   ORDER  BY datahora'
  ,_sensors, _type)
USING  _id;

Again, individual column names could be escaped properly and would be the clean way.

Variable number of columns sharing the same type

After your question updates it looks like your return type has

  • a variable number of columns
  • but all columns of the same type double precision (alias float8)

Use an ARRAY type in this case to nest a variable number of values. Additionally, I return an array with column names:

CREATE OR REPLACE FUNCTION data_of(_id integer)
  RETURNS TABLE (datahora timestamp, names text[], values float8[])
  LANGUAGE plpgsql AS
$func$
DECLARE
   _sensors text := 'col1, col2, col3';  -- plain list of column names
   _type    text := 'foo';
BEGIN
   RETURN QUERY EXECUTE format('
      SELECT datahora
           , string_to_array($1)  -- AS names
           , ARRAY[%s]            -- AS values
      FROM   %s
      WHERE  id = $2
      ORDER  BY datahora'
    , _sensors, _type)
   USING  _sensors, _id;
END
$func$;


Various complete table types

To actually return all columns of a table, there is a simple, powerful solution using a polymorphic type:

CREATE OR REPLACE FUNCTION data_of(_tbl_type anyelement, _id int)
  RETURNS SETOF anyelement
  LANGUAGE plpgsql AS
$func$
BEGIN
   RETURN QUERY EXECUTE format('
      SELECT *
      FROM   %s  -- pg_typeof returns regtype, quoted automatically
      WHERE  id = $1
      ORDER  BY datahora'
    , pg_typeof(_tbl_type))
   USING  _id;
END
$func$;

Call (important!):

SELECT * FROM data_of(NULL::pcdmet, 17);

Replace pcdmet in the call with any other table name.

How does this work?

anyelement is a pseudo data type, a polymorphic type, a placeholder for any non-array data type. All occurrences of anyelement in the function evaluate to the same type provided at run time. By supplying a value of a defined type as argument to the function, we implicitly define the return type.

PostgreSQL automatically defines a row type (a composite data type) for every table created, so there is a well defined type for every table. This includes temporary tables, which is convenient for ad-hoc use.

Any type can be NULL. Hand in a NULL value, cast to the table type: NULL::pcdmet.

Now the function returns a well-defined row type and we can use SELECT * FROM data_of() to decompose the row and get individual columns.

pg_typeof(_tbl_type) returns the name of the table as object identifier type regtype. When automatically converted to text, identifiers are automatically double-quoted and schema-qualified if needed, defending against SQL injection automatically. This can even deal with schema-qualified table-names where quote_ident() would fail. See:

这篇关于重构一个 PL/pgSQL 函数以返回各种 SELECT 查询的输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆