如何在 PostgreSQL 中的函数内返回 SELECT 的结果? [英] How to return result of a SELECT inside a function in PostgreSQL?

查看:107
本文介绍了如何在 PostgreSQL 中的函数内返回 SELECT 的结果?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在PostgreSQL中有这个函数,但是我不知道如何返回查询的结果:

I have this function in PostgreSQL, but I don't know how to return the result of the query:

CREATE OR REPLACE FUNCTION wordFrequency(maxTokens INTEGER)
  RETURNS SETOF RECORD AS
$$
BEGIN
    SELECT text, count(*), 100 / maxTokens * count(*)
    FROM (
        SELECT text
    FROM token
    WHERE chartype = 'ALPHABETIC'
    LIMIT maxTokens
    ) as tokens
    GROUP BY text
    ORDER BY count DESC
END
$$
LANGUAGE plpgsql;

但是我不知道如何在PostgreSQL函数内部返回查询的结果.

But I don't know how to return the result of the query inside the PostgreSQL function.

我发现返回类型应该是SETOF RECORD,对吧?但是返回命令不对.

I found that the return type should be SETOF RECORD, right? But the return command is not right.

这样做的正确方法是什么?

What is the right way to do this?

推荐答案

使用 返回查询:

CREATE OR REPLACE FUNCTION word_frequency(_max_tokens int)
  RETURNS TABLE (txt   text   -- also visible as OUT parameter inside function
               , cnt   bigint
               , ratio bigint)
  LANGUAGE plpgsql AS
$func$
BEGIN
   RETURN QUERY
   SELECT t.txt
        , count(*) AS cnt                 -- column alias only visible inside
        , (count(*) * 100) / _max_tokens  -- I added brackets
   FROM  (
      SELECT t.txt
      FROM   token t
      WHERE  t.chartype = 'ALPHABETIC'
      LIMIT  _max_tokens
      ) t
   GROUP  BY t.txt
   ORDER  BY cnt DESC;                    -- potential ambiguity 
END
$func$;

调用:

SELECT * FROM word_frequency(123);

显式定义返回类型比返回通用record更实用.这样您就不必为每个函数调用提供一个列定义列表.返回表 是一种方法.还有其他人.OUT 参数的数据类型必须与查询返回的内容完全匹配.

Defining the return type explicitly is much more practical than returning a generic record. This way you don't have to provide a column definition list with every function call. RETURNS TABLE is one way to do that. There are others. Data types of OUT parameters have to match exactly what is returned by the query.

仔细选择OUT 参数的名称.它们几乎在函数体中的任何地方都可见.表限定同名列以避免冲突或意外结果.我对示例中的所有列都这样做了.

Choose names for OUT parameters carefully. They are visible in the function body almost anywhere. Table-qualify columns of the same name to avoid conflicts or unexpected results. I did that for all columns in my example.

但请注意 OUT 参数 cnt 和同名列别名之间潜在的命名冲突.在这种特殊情况下 (RETURN QUERY SELECT ...) Postgres 使用列别名而不是 OUT 参数.但是,这在其他上下文中可能不明确.有多种方法可以避免混淆:

But note the potential naming conflict between the OUT parameter cnt and the column alias of the same name. In this particular case (RETURN QUERY SELECT ...) Postgres uses the column alias over the OUT parameter either way. This can be ambiguous in other contexts, though. There are various ways to avoid any confusion:

  1. 使用项目在 SELECT 列表中的序号位置:ORDER BY 2 DESC.例子:

不要使用文本"或计数"作为列名.两者在 Postgres 中使用都是合法的,但计数"是不合法的.是标准中的保留字SQL 和基本函数名称和文本"是一种基本数据类型.可能会导致混淆错误.我在示例中使用了 txtcnt,您可能需要更明确的名称.

Don't use "text" or "count" as column names. Both are legal to use in Postgres, but "count" is a reserved word in standard SQL and a basic function name and "text" is a basic data type. Can lead to confusing errors. I use txt and cnt in my examples, you may want more explicit names.

添加了缺失的 ; 并更正了标题中的语法错误.(_max_tokens int),不是 (int maxTokens) - name 之后输入.

Added a missing ; and corrected a syntax error in the header. (_max_tokens int), not (int maxTokens) - type after name.

在使用整数除法时,最好先乘后除,以最小化舍入误差.或者使用 numeric 或浮点类型.见下文.

While working with integer division, it's better to multiply first and divide later, to minimize the rounding error. Or work with numeric or a floating point type. See below.

这就是我认为您的查询实际上应该是什么样子(计算每个令牌的相对份额):

This is what I think your query should actually look like (calculating a relative share per token):

CREATE OR REPLACE FUNCTION word_frequency(_max_tokens int)
  RETURNS TABLE (txt            text
               , abs_cnt        bigint
               , relative_share numeric)
  LANGUAGE plpgsql AS
$func$
BEGIN
   RETURN QUERY
   SELECT t.txt, t.cnt
        , round((t.cnt * 100) / (sum(t.cnt) OVER ()), 2)  -- AS relative_share
   FROM  (
      SELECT t.txt, count(*) AS cnt
      FROM   token t
      WHERE  t.chartype = 'ALPHABETIC'
      GROUP  BY t.txt
      ORDER  BY cnt DESC
      LIMIT  _max_tokens
      ) t
   ORDER  BY t.cnt DESC;
END
$func$;

表达式 sum(t.cnt) OVER() 是一个 窗口函数.您可以使用CTE而不是子查询.漂亮,但是在像这样的简单情况下(主要是在 Postgres 12 之前),子查询通常更便宜.

The expression sum(t.cnt) OVER () is a window function. You could use a CTE instead of the subquery. Pretty, but a subquery is typically cheaper in simple cases like this one (mostly before Postgres 12).

最后的明确的RETURN 语句不需要(但允许)在使用 OUT 参数或 RETURNS TABLE(隐式使用 OUT 参数)时.

A final explicit RETURN statement is not required (but allowed) when working with OUT parameters or RETURNS TABLE (which makes implicit use of OUT parameters).

round() 带两个参数 仅适用于 numeric 类型.子查询中的 count() 产生一个 bigint 结果和一个 sum() 在这个 bigint 上产生一个 numeric 结果,因此我们会自动处理 numeric 数字,一切都恰到好处.

round() with two parameters only works for numeric types. count() in the subquery produces a bigint result and a sum() over this bigint produces a numeric result, thus we deal with a numeric number automatically and everything just falls into place.

这篇关于如何在 PostgreSQL 中的函数内返回 SELECT 的结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆