当组归属取决于前一行时,如何在postgresql中标记组? [英] How to label groups in postgresql when group belonging depends on the preceding line?
问题描述
我希望在请求中以最后一个已知值填充所有Null值。
当它在表格中而不是在请求中时,很容易:
如果我按以下方式定义并填写表格:
创建表test_fill_null(
日期为INTEGER,
值为INTEGER
);
插入test_fill_null值
(1,2),
(2,NULL),
(3,45),
(4,NULL ),
(5,null);
SELECT * FROM test_fill_null;
日期|价值
------ + -------
1 | 2
2 |
3 | 45
4 |
5 |
然后我只需要这样填写:
UPDATE test_fill_null t1
SET值=(
选择t2.value
FROM test_fill_null t2
WHERE t2.date< = t1 .date AND value not NULL
OR t by t2.date DESC
LIMIT 1
);
SELECT * FROM test_fill_null;
日期|价值
------ + -------
1 | 2
2 | 2
3 | 45
4 | 45
5 | 45
但是现在,我正在请求,就像这样:
与
pre_table AS(
SELECT
id1,
id2,
tms,
情况
当tms-w(间隔'5分钟)上的滞后(tms)则为NULL
ELSE id2
作为group_id
从
table0 $结束b $ b窗口w as(按tms按id1顺序划分)
)
当上一个点距离超过5分钟时,group_id设置为id2,否则为null。这样,我想得出的点是彼此之间相距不到5分钟的点,并且每组之间的间隔超过5分钟。
然后我不知道该如何进行。我试过了:
选择(id1,id2)
t0.id1,
t0.id2 ,
t0.tms,
t1.group_id
FROM
pre_table t0
左联接(
选择
id1,
tms ,
group_id
来自pre_table t2
,其中t2.group_id不为空
由tms排序desc
)t1
ON
t1.tms < = t0.tms和
t1.id1 = t0.id1
t0.id1不为空
按
id1,
id2排序,
t1.tms DESC
但是在最终结果中,我有一组连续两个距离超过5分钟的点。在这种情况下,它们应该是两个不同的组。
select中的select 是通常称为子查询或子查询。在您的特定情况下,它是一个相关子查询。 LATERAL
联接(postgres 9.3中的新增功能)可以使用更灵活的解决方案在很大程度上替换相关子查询:
对于您的第一种情况,此查询可能更快,更简单,
选择日期,最大值(值)OVER(按grp划分)AS值
FROM(
SELECT *,count(value)OVER(按日期排序)AS grp
FROM test_fill_null
)子;
count()
仅计非空值,因此 grp
随每个非空的 value
递增,从而根据需要形成组。在外部<$中选择每个 grp
中的一个非空值
很简单c $ c> SELECT 。
对于您的第二种情况 ,假设行的初始顺序由(id1,id2,tms)
确定,如您的查询之一所示。
选择id1,id2,tms
,count(step)OVER(ORDER BY id1,id2,tms)AS group_id
FROM(
SELECT *,当lag(tms,1,'-infinity')OVER(PARTITION BY id1 ORDER BY id2,tms)时的情况
< tms-间隔'5分钟'
然后为真END AS步骤
FROM table0
)sub
ORDER BY id1,id2,tms;
适应您的实际订单。其中一个可能会覆盖它:
PARTITION BY id1 ORDER BY ID2-忽略tms
PARTITION BY id1 ORDER BY tms-忽略id2
SQL小提琴 ,带有扩展示例。
相关:
I want, in a request, to fill all Null values by the last known value. When it's in a table and not in a request, it's easy:
If I define and fill my table as follows:
CREATE TABLE test_fill_null (
date INTEGER,
value INTEGER
);
INSERT INTO test_fill_null VALUES
(1,2),
(2, NULL),
(3, 45),
(4,NULL),
(5, null);
SELECT * FROM test_fill_null ;
date | value
------+-------
1 | 2
2 |
3 | 45
4 |
5 |
Then I just have to fill like that:
UPDATE test_fill_null t1
SET value = (
SELECT t2.value
FROM test_fill_null t2
WHERE t2.date <= t1.date AND value IS NOT NULL
ORDER BY t2.date DESC
LIMIT 1
);
SELECT * FROM test_fill_null;
date | value
------+-------
1 | 2
2 | 2
3 | 45
4 | 45
5 | 45
But now, I'm in a request, like this one:
WITH
pre_table AS(
SELECT
id1,
id2,
tms,
CASE
WHEN tms - lag(tms) over w < interval '5 minutes' THEN NULL
ELSE id2
END as group_id
FROM
table0
window w as (partition by id1 order by tms)
)
Where the group_id is set to id2 when the previous point is distant from more than 5 minutes, null otherwise. By doing so, I want to end up with group of points that follow each other by less than 5 minutes, and gaps of more than 5 minutes between each groups.
Then I don't know how to proceed. I tried:
SELECT distinct on (id1, id2)
t0.id1,
t0.id2,
t0.tms,
t1.group_id
FROM
pre_table t0
LEFT JOIN (
select
id1,
tms,
group_id
from pre_table t2
where t2.group_id is not null
order by tms desc
) t1
ON
t1.tms <= t0.tms AND
t1.id1 = t0.id1
WHERE
t0.id1 IS NOT NULL
ORDER BY
id1,
id2,
t1.tms DESC
But in the final result I have some group with two consecutive points which are distant from more than 5 minutes. Their should be two different groups in this case.
A "select within a select" is more commonly called "subselect" or "subquery" In your particular case it's a correlated subquery. LATERAL
joins (new in postgres 9.3) can largely replace correlated subqueries with more flexible solutions:
I don't think you need either here.
For your first case this query is probably faster and simpler, though:
SELECT date, max(value) OVER (PARTITION BY grp) AS value
FROM (
SELECT *, count(value) OVER (ORDER BY date) AS grp
FROM test_fill_null
) sub;
count()
only counts non-null values, so grp
is incremented with every non-null value
, thereby forming groups as desired. It's trivial to pick the one non-null value
per grp
in the outer SELECT
.
For your second case, I'll assume the initial order of rows is determined by (id1, id2, tms)
as indicated by one of your queries.
SELECT id1, id2, tms
, count(step) OVER (ORDER BY id1, id2, tms) AS group_id
FROM (
SELECT *, CASE WHEN lag(tms, 1, '-infinity') OVER (PARTITION BY id1 ORDER BY id2, tms)
< tms - interval '5 min'
THEN true END AS step
FROM table0
) sub
ORDER BY id1, id2, tms;
Adapt to your actual order. One of these might cover it:
PARTITION BY id1 ORDER BY id2 -- ignore tms
PARTITION BY id1 ORDER BY tms -- ignore id2
SQL Fiddle with an extended example.
Related:
这篇关于当组归属取决于前一行时,如何在postgresql中标记组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!