如何自下而上遍历树以计算PostgreSQL中节点值的(加权)平均值? [英] How can I traverse a tree bottom-up to calculate a (weighted) average of node values in PostgreSQL?

查看:48
本文介绍了如何自下而上遍历树以计算PostgreSQL中节点值的(加权)平均值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

例如在PostgreSQL中对整个树求和使用的是WITH RECURSIVE(公用表表达式).但是,这些示例通常从上到下,将树展平,并对整个结果集执行汇总功能.我没有找到合适的示例(在StackOverflow,Google等上)解决我要解决的问题:

The typical example for e.g. summing a whole tree in PostgreSQL is using WITH RECURSIVE (Common Table Expressions). However, these examples typically go from top to bottom, flatten the tree and perform an aggregate function on the whole result set. I have not found a suitable example (on StackOverflow, Google, etc.) for the problem I am trying to solve:

考虑一个不平衡的树,其中每个节点可以具有一个关联的值.大多数值都附加到叶节点,但其他值也可能具有值.如果节点(是否有叶子)具有显式附加的值,则可以直接使用该值,而无需进行进一步的计算(然后可以忽略子树).如果节点没有值,则该值应作为其直接子级节点的平均值来计算.

Consider an unbalanced tree where each node can have an associated value. Most of the values are attached to leaf nodes, but the others may have values as well. If a node (leaf or not) has an explicitly attached value, this value can be directly used without further calculation (subtree can be ignored, then). If the node has no value, the value should be computed as the average of its direct children.

但是,由于不能保证所有节点都附加值,因此我需要自下而上以获得总平均值.简而言之,从叶子开始,我需要对每组兄弟应用 AVG()并将此(中间)结果用作父节点的值(如果没有).继而将该父(新)值(明确附加,或其子代的平均值)用于下一级别的平均值(父子及其同级兄弟的平均值)的计算.

However, as none of the nodes are guaranteed to have a value attached, I need to go bottom up in order to obtain a total average. In a nutshell, starting from the leafs, I need to apply AVG() to each set of siblings and use this (intermediate) result as value for the parent node (if it has none). This parent's (new) value (explicitly attached, or the average of its children) is in turn used in the calculation of average values at the next level (the average value of the parent and its siblings).

示例情况:

A
+- B (6)
+- C
   +- D
      +- E (10)
      +- F (2)
+- H (18)
   +- I (102)
   +- J (301)

我需要计算A的平均值,该平均值应为 10 (因为(6 + 6 + 18)/3 = 10 I J 被忽略).

I need to compute the average value for A, which should be 10 (because (6+6+18)/3 = 10 and I,J are ignored).

推荐答案

您的数据可以存储为:

create table tree(id int primary key, parent int, caption text, node_value int);
insert into tree values
(1, 0, 'A', null),
(2, 1, 'B', 6),
(3, 1, 'C', null),
(4, 3, 'D', null),
(5, 4, 'E', 10),
(6, 4, 'F', 2),
(7, 1, 'H', 18),
(8, 7, 'I', 102),
(9, 7, 'J', 301);

进行自下而上聚合的最简单方法是递归函数.

The simplest way to do bottom-up aggregation is a recursive function.

create or replace function get_node_value(node_id int)
returns int language plpgsql as $$
declare
    val int;
begin
    select node_value
    from tree 
    where id = node_id
    into val;
    if val isnull then
        select avg(get_node_value(id))
        from tree
        where parent = node_id
        into val;
    end if;
    return val;
end;
$$;

select get_node_value(1);

 get_node_value 
----------------
             10
(1 row)

在此处进行测试.

在sql函数中可以实现相同的目的.功能代码不是很明显,但是可能比plpgsql快一点.

It is possible to achieve the same in an sql function. The function code is not so obvious but it may be a bit faster than plpgsql.

create or replace function get_node_value_sql(node_id int)
returns int language sql as $$
    select coalesce(
        node_value,
        (
            select avg(get_node_value_sql(id))::int
            from tree
            where parent = node_id
        )
    )
    from tree 
    where id = node_id;
$$;


使用cte从下至上查看树并不是特别复杂.在这种特殊情况下,困难在于必须分别计算每个级别的平均值.


Viewing a tree from the bottom up using cte is not especially complicated. In this particular case the difficulty lies in the fact that average should be computed for each level separately.

with recursive bottom_up(id, parent, caption, node_value, level, calculated) as (
    select 
        *, 
        0, 
        node_value calculated
    from tree t
    where not exists (
        select id
        from tree
        where parent = t.id)
union all
    select 
        t.*, 
        b.level+ 1,
        case when t.node_value is null then b.calculated else t.node_value end
    from tree t
    join bottom_up b on t.id = b.parent
)

select id, parent, caption, avg(calculated)::int calculated
from (
    select id, parent, caption, level, avg(calculated)::int calculated
    from bottom_up
    group by 1, 2, 3, 4
    ) s
group by 1, 2, 3
order by 1;

在此处进行测试.

这篇关于如何自下而上遍历树以计算PostgreSQL中节点值的(加权)平均值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆