递归查询中不允许使用聚合函数.是否有其他方法来编写此查询? [英] Aggregate functions are not allowed in a recursive query. Is there an alternative way to write this query?

查看:106
本文介绍了递归查询中不允许使用聚合函数.是否有其他方法来编写此查询?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

TL;DR 我不知道如何编写在其递归部分不使用聚合函数的递归 Postgres 查询.是否有其他方法可以编写如下所示的递归查询?

TL;DR I can't figure out how to write a recursive Postgres query that doesn't use aggregate functions in its recursive part. Is there an alternative way to write the recursive query shown below?

假设我们有一些运动:

CREATE TABLE sports (id INTEGER, name TEXT);

INSERT INTO sports VALUES (1, '100 meter sprint');
INSERT INTO sports VALUES (2, '400 meter sprint');
INSERT INTO sports VALUES (3, '50 meter swim');
INSERT INTO sports VALUES (4, '100 meter swim');

以及参加这些运动的运动员的一些单圈时间:

And some lap times for athletes competing in those sports:

CREATE TABLE lap_times (sport_id INTEGER, athlete TEXT, seconds NUMERIC);

INSERT INTO lap_times VALUES (1, 'Alice',  10);
INSERT INTO lap_times VALUES (1, 'Bob',    11);
INSERT INTO lap_times VALUES (1, 'Claire', 12);

INSERT INTO lap_times VALUES (2, 'Alice',  40);
INSERT INTO lap_times VALUES (2, 'Bob',    38);
INSERT INTO lap_times VALUES (2, 'Claire', 39);

INSERT INTO lap_times VALUES (3, 'Alice',  25);
INSERT INTO lap_times VALUES (3, 'Bob',    23);
INSERT INTO lap_times VALUES (3, 'Claire', 24);

INSERT INTO lap_times VALUES (4, 'Alice',  65);
INSERT INTO lap_times VALUES (4, 'Bob',    67);
INSERT INTO lap_times VALUES (4, 'Claire', 66);

我们想创建一些任意的类别:

We want to create some arbitrary categories:

CREATE TABLE categories (id INTEGER, name TEXT);

INSERT INTO categories VALUES (1, 'Running');
INSERT INTO categories VALUES (2, 'Swimming');
INSERT INTO categories VALUES (3, '100 meter');

并使我们的体育成员成为这些类别的成员:

And make our sports members of those categories:

CREATE TABLE memberships (category_id INTEGER, member_type TEXT, member_id INTEGER);

INSERT INTO memberships VALUES (1, 'Sport', 1);
INSERT INTO memberships VALUES (1, 'Sport', 2);

INSERT INTO memberships VALUES (2, 'Sport', 3);
INSERT INTO memberships VALUES (2, 'Sport', 4);

INSERT INTO memberships VALUES (3, 'Sport', 1);
INSERT INTO memberships VALUES (3, 'Sport', 4);

我们想要一个包含其他类别的超级"类别:

And we want a 'super' category that contains other categories:

INSERT INTO categories VALUES (4, 'Running + Swimming');

INSERT INTO memberships VALUES (4, 'Category', 1);
INSERT INTO memberships VALUES (4, 'Category', 2);

现在是棘手的一点.

我们希望根据运动员在每项运动中的单圈时间对他们进行排名:

We want to rank our athletes on their lap times for each sport:

SELECT sport_id, athlete,
  RANK() over(PARTITION BY sport_id ORDER BY seconds)
FROM lap_times lt;

但我们也希望在类别级别执行此操作.当我们这样做时,运动员的排名应该基于他们在该类别中所有运动的平均排名.例如:

But we also want to do this at a category level. When we do, the rank for the athlete should be based on their average rank across all sports in that category. For example:

Alice is 1st in 100 meter sprint and 3rd in 400 meter sprint
  -> average rank: 2

Bob is 2nd in 100 meter sprint and 1st in 400 meter sprint
  -> average rank: 1.5

Claire is 3rd in 100 meter sprint and 2nd in 400 meter sprint
  -> average rank: 2.5

Ranking for running: 1st Bob, 2nd Alice, 3rd Claire

对于超级"类别,运动员的排名应基于他们跨类别的平均排名,而不是这些类别中的基础运动.即它应该只考虑它的直接孩子,而不是扩展所有的运动.

And for 'super' categories, the rank for the athlete should be based on their average rank across categories, NOT the underlying sports within those categories. i.e. it should only consider its direct children, rather than expanding out all the sports.

我尽力编写查询来计算这些排名.这是一个递归查询,从体育项目的底部开始,并通过会员资格向上计算类别和超级"类别的排名.这是我的查询:

I tried my best to write a query to calculate these rankings. It's a recursive query that starts at the bottom with the sports and works its way up through memberships to calculate rankings for categories and 'super' categories. Here's my query:

WITH RECURSIVE rankings(rankable_type, rankable_id, athlete, value, rank) AS (
  SELECT 'Sport', sport_id, athlete, seconds, RANK() over(PARTITION BY sport_id ORDER BY seconds)
  FROM lap_times lt

  UNION ALL

  SELECT 'Category', category_id, athlete, avg(r.rank), RANK() OVER (PARTITION by category_id ORDER BY avg(r.rank))
  FROM categories c
  JOIN memberships m ON m.category_id = c.id
  JOIN rankings r ON r.rankable_type = m.member_type AND r.rankable_id = m.member_id
  GROUP BY category_id, athlete
)
SELECT * FROM rankings;

但是,当我运行它时,我收到以下错误:

However, when I run it, I receive the following error:

ERROR: aggregate functions are not allowed in a recursive query's recursive term

这是由查询递归部分中的 avg(r.rank) 引起的.Postgresql 不允许在查询的递归部分调用聚合函数.有没有其他的写法?

This is cause by avg(r.rank) in the recursive part of the query. Postgresql doesn't allow aggregate functions to be called in the recursive part of the query. Is there an alternative way to write this?

如果我将 avg(r.rank), RANK() ... 替换为 NULL, NULL 查询将执行并且结果看起来对于运动来说是正确的包括类别的预期行数.

If I swap avg(r.rank), RANK() ... out for NULL, NULL the query executes and the results look correct for sports and it includes the expected number of rows for categories.

我想过可能会尝试使用嵌套查询将递归展开到两个或三个级别,因为这对我的用例来说很好,但我想在尝试之前先在这里问一下.

I thought about maybe trying to unwind the recursion to two or three levels using nested queries as that would be fine for my use case, but I thought I'd ask here first before trying that.

另一种选择可能是更改架构,使其不那么灵活,从而使运动不能属于多个类别.我不确定在这种情况下查询的外观,但它可能更简单?

Another alternative might be to change the schema so it's less flexible so that sports cannot belong to multiple categories. I'm not sure how the query would look in that case, but it might be simpler?

提前致谢,我真的很感激.

Thanks in advance, I really appreciate it.

推荐答案

虽然不漂亮,但我找到了解决方案:

It's not pretty, but I found a solution:

WITH RECURSIVE rankings(rankable_type, rankable_id, athlete, value, rank) AS (
  SELECT 'Sport', sport_id, athlete, seconds, RANK() over(PARTITION BY sport_id ORDER BY seconds)
  FROM lap_times lt

  UNION ALL

  SELECT 'Category', *, rank() OVER(PARTITION by category_id ORDER BY avg_rank) FROM (
    SELECT DISTINCT category_id, athlete, avg(r.rank) OVER (PARTITION by category_id, athlete) AS avg_rank
    FROM categories c
    JOIN memberships m ON m.category_id = c.id
    JOIN rankings r ON r.rankable_type = m.member_type AND r.rankable_id = m.member_id
  ) _
)
SELECT * FROM rankings;

在查询的递归部分,我没有调用 GROUP BY 并计算 avg(r.rank),而是使用在相同列上分区的窗口函数.这与计算平均排名具有相同的效果.

In the recursive part of the query, instead of calling GROUP BY and calculating avg(r.rank), I use a window function partitioned on the same columns. This has the same effect of calculating the average rank.

一个缺点是这种计算发生的次数超过了必要的次数.如果我们可以GROUP BY然后avg(r.rank),那会比avg(r.rank)然后更有效率GROUP BY.

One downside is that this calculation happens more times than is necessary. If we could GROUP BY then avg(r.rank), that would be more efficient than avg(r.rank) then GROUP BY.

由于现在嵌套查询的结果中有重复项,我使用 DISTINCT 来过滤掉这些,然后外部查询计算出一个 RANK()基于这些平均值的每个 category_id 中的所有运动员.

Since there are now duplicates in the result of the nested query, I'm using DISTINCT to filter these out and then the outer query calculates a RANK() of all athletes in each category_id based on these averages.

我仍然很想知道是否有人知道更好的方法来做到这一点.谢谢

I'd still be keen to hear if anyone knows of a better way to do this. Thanks

这篇关于递归查询中不允许使用聚合函数.是否有其他方法来编写此查询?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆