在给定唯一ID的情况下,仅选择其列之前已更改的行 [英] Select only rows that has a column changed from the rows before it, given an unique ID

查看:228
本文介绍了在给定唯一ID的情况下,仅选择其列之前已更改的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个postgreSQL数据库,我想在其中记录特定列随时间变化的方式.表1:

I have a postgreSQL database where I want to record how a specific column changes for each id, over time. Table1:

personID | status | unixtime | column d | column e | column f
    1        2       213214      x            y        z
    1        2       213325      x            y        z
    1        2       213326      x            y        z
    1        2       213327      x            y        z
    1        2       213328      x            y        z
    1        3       214330      x            y        z
    1        3       214331      x            y        z
    1        3       214332      x            y        z
    1        2       324543      x            y        z

我想跟踪一段时间内的所有状态.因此,基于此,我想要一个新表table2,其中包含以下数据:

I want to track all the of status over time. So based on this I want a new table, table2 with the following data:

personID | status | unixtime | column d | column e | column f
    1        2       213214      x            y        z
    1        3       214323      x            y        z
    1        2       324543      x            y        z

x,y,z是可以并且将在每一行之间变化的变量.这些表还有成千上万的personID,但我想捕获的ID也有所变化.按状态,personid进行单个分组是不够的(如我所见),因为我可以存储具有相同状态和personID的几行,就像状态发生了变化一样.

x,y,z are variables that can and will vary between each row. The tables have thousands of others personID with changing ID's that I would like to capture as well. A single group by status,personid is not enough (as I see it) as I can store several rows of the same status and personID, just as there has been a status change.

我在Python中执行此操作,但是它非常慢(而且我想它的IO也很多):

I do this in Python, but it's pretty slow (and I guess its a lot of IO):

for person in personid:
    status = -1
    records = getPersonRecords(person) #sorted by unixtime in query
    newrecords = []
    for record in records:
        if record.status != status:
                 status = record.status
                 newrecords.append(record)
    appendtoDB(newrecords)

推荐答案

这是一个孤岛问题.您需要每个孤岛的开始,可以通过将当前行的状态与上一个"记录的状态进行比较来识别.

This is a gaps-and-island problem. You want the start of each island, which you can identify by comparing the status on the current row to the status on the "previous" record.

窗口函数对此非常有用:

Window functions come handy for this:

select t.*
from (
    select t.*, lag(status) over(partition by personID order by unixtime) lag_status
    from mytable t
) t
where lag_status is null or status <> lag_status

这篇关于在给定唯一ID的情况下,仅选择其列之前已更改的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆