如何按非唯一值对后面的行进行分组 [英] How to group following rows by not unique value

查看:96
本文介绍了如何按非唯一值对后面的行进行分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这样的数据:

table1
_____________
id way time
1  1   00:01
2  1   00:02
3  2   00:03
4  2   00:04
5  2   00:05
6  3   00:06
7  3   00:07
8  1   00:08
9  1   00:09

我想知道在哪个时间间隔上以哪种方式:

I would like to know in which time interval I was on which way:

desired output
_________________
id  way from   to    
1   1   00:01  00:02
3   2   00:03  00:05
6   3   00:06  00:07
8   1   00:08  00:09

我尝试使用窗口函数:

SELECT DISTINCT
  first_value(id) OVER w AS id, 
  first_value(way) OVER w as way,
  first_value(time) OVER w as from,
  last_value(time) OVER w as to
FROM table1
WINDOW w AS (
  PARTITION BY way ORDER BY ID
  range between unbounded preceding and unbounded following);

我得到的是:

ID  way from   to    
 1   1  00:01  00:09
 3   2  00:03  00:05
 6   3  00:06  00:07

这是不正确的,因为在方式1上我不是从00:01到00:09。
是否有可能按照顺序进行分区,这意味着仅对以下相等的属性进行分组?

And this is not correct, because on way 1 I wasn't from 00:01 to 00:09. Is there a possibility to do the partition according to the order, means grouping only following attributes, that are equal?

推荐答案

如果您的案例像示例值所建议的那样简单,那么 @Giorgos的答案效果很好。

If your case is as simple as the example values suggest, @Giorgos' answer serves nicely.

但是,通常不是这种情况。如果 id 列是 serial ,则不能依赖以下假设:行中的<$ c $较早c> time 的 id 也较小。

另外, time 值(或 timestamp 可能很容易)是重复的,您需要使排序顺序明确。

However, that's typically not the case. If the id column is a serial, you cannot rely on the assumption that a row with an earlier time also has a smaller id.
Also, time values (or timestamp like you probably have) can easily be duplicates, you need to make the sort order unambiguous.

假设两者都可能发生,并且您希望每个时间片中 id 的行中最早的 id (实际上,最小 id 最早的时间,可能会有联系),此查询将处理这种情况正确地:

Assuming both can happen, and you want the id from the row with the earliest time per time slice (actually, the smallest id for the earliest time, there could be ties), this query would deal with the situation properly:

SELECT *
FROM  (
   SELECT DISTINCT ON (way, grp)
          id, way, time AS time_from
        , max(time) OVER (PARTITION BY way, grp) AS time_to
   FROM (
      SELECT *
           , row_number() OVER (ORDER BY time, id)  -- id as tie breaker
           - row_number() OVER (PARTITION BY way ORDER BY time, id) AS grp
      FROM   table1
      ) t
   ORDER  BY way, grp, time, id
   ) sub
ORDER  BY time_from, id;




  • 按时间排序,id 是明确的。假设时间不是唯一的,请添加(假定唯一的) id 以避免任意结果-可能会在不同查询之间更改。

    • ORDER BY time, id to be unambiguous. Assuming time is not unique, add the (assumed unique) id to avoid arbitrary results - that could change between queries in sneaky ways.

      max(time)OVER(按方式分配,grp):不带 ORDER BY ,窗口框架跨越PARTITION的所有行,因此我们获得每个时间片的绝对最大值。

      max(time) OVER (PARTITION BY way, grp): without ORDER BY, the window frame spans all rows of the PARTITION, so we get the absolute maximum per time slice.

      外部查询层仅需要在结果中产生所需的排序顺序,因为我们在子查询 sub ORDER BY c>使用 DISTINCT ON 。详细信息:

      The outer query layer is only necessary to produce the desired sort order in the result, since we are bound to a different ORDER BY in the subquery sub by using DISTINCT ON. Details:

      • Select first row in each GROUP BY group?

      SQL提琴 演示用例。

      SQL Fiddle demonstrating the use case.

      如果您要优化性能,则在这种情况下使用plpgsql函数可能会更快。密切相关的答案:

      If you are looking to optimize performance, a plpgsql function could be faster in such a case. Closely related answer:

      • Group by repeating attribute

      此外:请勿使用基本类型名称 time 作为标识符(也是在标准SQL中保留的字)。

      Aside: don't use the basic type name time as identifier (also a reserved word in standard SQL).

      这篇关于如何按非唯一值对后面的行进行分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆