当组归属取决于前一行时,如何在postgresql中标记组? [英] How to label groups in postgresql when group belonging depends on the preceding line?

查看:75
本文介绍了当组归属取决于前一行时,如何在postgresql中标记组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望在请求中以最后一个已知值填充所有Null值。
当它在表格中而不是在请求中时,很容易:



如果我按以下方式定义并填写表格:

 创建表test_fill_null(
日期为INTEGER,
值为INTEGER
);

插入test_fill_null值
(1,2),
(2,NULL),
(3,45),
(4,NULL ),
(5,null);

SELECT * FROM test_fill_null;
日期|价值
------ + -------
1 | 2
2 |
3 | 45
4 |
5 |

然后我只需要这样填写:

  UPDATE test_fill_null t1 
SET值=(
选择t2.value
FROM test_fill_null t2
WHERE t2.date< = t1 .date AND value not NULL
OR t by t2.date DESC
LIMIT 1
);

SELECT * FROM test_fill_null;
日期|价值
------ + -------
1 | 2
2 | 2
3 | 45
4 | 45
5 | 45

但是现在,我正在请求,就像这样:

 
pre_table AS(
SELECT
id1,
id2,
tms,
情况
当tms-w(间隔'5分钟)上的滞后(tms)则为NULL
ELSE id2
作为group_id

table0 $结束b $ b窗口w as(按tms按id1顺序划分)

当上一个点距离超过5分钟时,group_id设置为id2,否则为null。这样,我想得出的点是彼此之间相距不到5分钟的点,并且每组之间的间隔超过5分钟。



然后我不知道该如何进行。我试过了:

 选择(id1,id2)
t0.id1,
t0.id2 ,
t0.tms,
t1.group_id
FROM
pre_table t0
左联接(
选择
id1,
tms ,
group_id
来自pre_table t2
,其中t2.group_id不为空
由tms排序desc
)t1
ON
t1.tms < = t0.tms和
t1.id1 = t0.id1

t0.id1不为空

id1,
id2排序,
t1.tms DESC

但是在最终结果中,我有一组连续两个距离超过5分钟的点。在这种情况下,它们应该是两个不同的组。

解决方案

select中的select 是通常称为子查询或子查询。在您的特定情况下,它是一个相关子查询 LATERAL 联接(postgres 9.3中的新增功能)可以使用更灵活的解决方案在很大程度上替换相关子查询:





对于您的第一种情况,此查询可能更快,更简单,

 选择日期,最大值(值)OVER(按grp划分)AS值
FROM(
SELECT *,count(value)OVER(按日期排序)AS grp
FROM test_fill_null
)子;

count()仅计非空值,因此 grp 随每个非空的 value 递增,从而根据需要形成组。在外部<$中选择每个 grp 中的一个非空很简单c $ c> SELECT 。






对于您的第二种情况 ,假设行的初始顺序由(id1,id2,tms)确定,如您的查询之一所示。

 选择id1,id2,tms 
,count(step)OVER(ORDER BY id1,id2,tms)AS group_id
FROM(
SELECT *,当lag(tms,1,'-infinity')OVER(PARTITION BY id1 ORDER BY id2,tms)时的情况
< tms-间隔'5分钟'
然后为真END AS步骤
FROM table0
)sub
ORDER BY id1,id2,tms;

适应您的实际订单。其中一个可能会覆盖它:

  PARTITION BY id1 ORDER BY ID2-忽略tms 
PARTITION BY id1 ORDER BY tms-忽略id2

SQL小提琴 ,带有扩展示例。



相关:




I want, in a request, to fill all Null values by the last known value. When it's in a table and not in a request, it's easy:

If I define and fill my table as follows:

CREATE TABLE test_fill_null (
    date INTEGER,
    value INTEGER
);

INSERT INTO test_fill_null VALUES 
    (1,2),
    (2, NULL), 
    (3, 45), 
    (4,NULL), 
    (5, null);

SELECT * FROM test_fill_null ;
 date | value 
------+-------
    1 |     2
    2 |      
    3 |    45
    4 |      
    5 |      

Then I just have to fill like that:

UPDATE test_fill_null t1
SET value = (
    SELECT t2.value 
    FROM test_fill_null t2 
    WHERE t2.date <= t1.date AND value IS NOT NULL 
    ORDER BY t2.date DESC 
    LIMIT 1
);

SELECT * FROM test_fill_null;
 date | value 
------+-------
    1 |     2
    2 |     2
    3 |    45
    4 |    45
    5 |    45

But now, I'm in a request, like this one:

WITH
    pre_table AS(
        SELECT 
            id1,
            id2,
            tms,
            CASE 
                WHEN tms - lag(tms) over w < interval '5 minutes' THEN NULL
                ELSE id2
            END as group_id
        FROM
            table0 
        window w as (partition by id1 order by tms)
    )

Where the group_id is set to id2 when the previous point is distant from more than 5 minutes, null otherwise. By doing so, I want to end up with group of points that follow each other by less than 5 minutes, and gaps of more than 5 minutes between each groups.

Then I don't know how to proceed. I tried:

    SELECT distinct on (id1, id2)
        t0.id1,
        t0.id2,
        t0.tms,
        t1.group_id
    FROM
        pre_table t0
        LEFT JOIN (
            select
                id1,
                tms,
                group_id
            from pre_table t2
            where t2.group_id is not null
            order by tms desc
        ) t1
        ON 
            t1.tms <= t0.tms AND
            t1.id1 = t0.id1  
    WHERE 
        t0.id1 IS NOT NULL
    ORDER BY
        id1,
        id2,
        t1.tms DESC

But in the final result I have some group with two consecutive points which are distant from more than 5 minutes. Their should be two different groups in this case.

解决方案

A "select within a select" is more commonly called "subselect" or "subquery" In your particular case it's a correlated subquery. LATERAL joins (new in postgres 9.3) can largely replace correlated subqueries with more flexible solutions:

I don't think you need either here.

For your first case this query is probably faster and simpler, though:

SELECT date, max(value) OVER (PARTITION BY grp) AS value
FROM  (
   SELECT *, count(value) OVER (ORDER BY date) AS grp
   FROM   test_fill_null
   ) sub;

count() only counts non-null values, so grp is incremented with every non-null value, thereby forming groups as desired. It's trivial to pick the one non-null value per grp in the outer SELECT.


For your second case, I'll assume the initial order of rows is determined by (id1, id2, tms) as indicated by one of your queries.

SELECT id1, id2, tms
     , count(step) OVER (ORDER BY id1, id2, tms) AS group_id
FROM  (
   SELECT *, CASE WHEN lag(tms, 1, '-infinity') OVER (PARTITION BY id1 ORDER BY id2, tms)
                       < tms - interval '5 min'
                  THEN true END AS step
   FROM   table0
   ) sub
ORDER  BY id1, id2, tms;

Adapt to your actual order. One of these might cover it:

PARTITION BY id1 ORDER BY id2  -- ignore tms
PARTITION BY id1 ORDER BY tms  -- ignore id2

SQL Fiddle with an extended example.

Related:

这篇关于当组归属取决于前一行时,如何在postgresql中标记组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆