PostgreSQL中window函数的第一个和最后一个值 [英] First and last value of window function in one row in PostgreSQL

查看:181
本文介绍了PostgreSQL中window函数的第一个和最后一个值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想为指定分区在一行中具有第一列的第一个值和第二列的最后一个值。为此,我创建了以下查询:

I'd like to have first value of one column and last value of second column in one row for a specified partition. For that I created this query:

SELECT DISTINCT
b.machine_id,
batch,
timestamp_sta,
timestamp_stp,
FIRST_VALUE(timestamp_sta) OVER w AS batch_start,
LAST_VALUE(timestamp_stp) OVER w AS batch_end
FROM db_data.sta_stp AS a
JOIN db_data.ll_lu AS b
ON a.ll_lu_id=b.id
WINDOW w AS (PARTITION BY batch, machine_id ORDER BY timestamp_sta)
ORDER BY timestamp_sta, batch, machine_id;

但是正如您在图像中看到的那样,batch_end列中返回的数据不正确。

But as you can see in the image, returned data in batch_end column are not correct.

batch_start 列的 timestamp_sta 列的第一个值正确。但是 batch_end 应该为 2012-09-17 10:49:45,它等于同一行中的 timestamp_stp

batch_start column has correct first value of timestamp_sta column. However batch_end should be "2012-09-17 10:49:45" and it equals timestamp_stp from same row.

为什么会这样?

推荐答案

这个问题很老,但是这个解决方案比到目前为止发布的解决方案更简单,更快捷:

The question is old, but this solution is simpler and faster than what's been posted so far:

SELECT b.machine_id
     , batch
     , timestamp_sta
     , timestamp_stp
     , min(timestamp_sta) OVER w AS batch_start
     , max(timestamp_stp) OVER w AS batch_end
FROM   db_data.sta_stp a
JOIN   db_data.ll_lu   b ON a.ll_lu_id = b.id
WINDOW w AS (PARTITION BY batch, b.machine_id) -- No ORDER BY !
ORDER  BY timestamp_sta, batch, machine_id; -- why this ORDER BY?

如果添加 ORDER BY 到窗口框架定义中,每一个具有较大 ORDER BY 表达式的下一行都有一个更高的框架开始。 min() first_value()都不能返回整个分区的第一个时间戳。没有 ORDER BY ,同一分区的所有行都是 peers ,您将获得所需的结果。

If you add ORDER BY to the window frame definition, each next row with a greater ORDER BY expression has a later frame start. Neither min() nor first_value() can return the "first" timestamp for the whole partition then. Without ORDER BY all rows of the same partition are peers and you get your desired result.

您添加的 ORDER BY 有效(不是窗口框架定义中的一个,外部的),但是似乎没有并使查询更加昂贵。您可能应该使用与窗口框架定义一致的 ORDER BY 子句,以避免额外的排序费用:

Your added ORDER BY works (not the one in the window frame definition, the outer one), but doesn't seem to make sense and makes the query more expensive. You should probably use an ORDER BY clause that agrees with your window frame definition to avoid additional sort cost:

... 
ORDER BY batch, b.machine_id, timestamp_sta, timestamp_stp

在此查询中我看不到需要 DISTINCT 。您可以根据需要添加它。或区别开()。但是,随后 ORDER BY 子句变得更加重要。请参阅:

I don't see the need for DISTINCT in this query. You could just add it if you actually need it. Or DISTINCT ON (). But then the ORDER BY clause becomes even more relevant. See:

  • Select first row in each GROUP BY group?

如果需要同一行中的其他其他列(同时仍按时间戳排序),则您的想法将与 FIRST_VALUE() LAST_VALUE()可能是要走的路。您可能需要将此附加到窗口框架定义 then

If you need some other column(s) from the same row (while still sorting by timestamps), your idea with FIRST_VALUE() and LAST_VALUE() might be the way to go. You'd probably need to append this to the window frame definition then:

ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

请参见:

  • PostgreSQL query with max and min date plus associated id per row

这篇关于PostgreSQL中window函数的第一个和最后一个值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆