删除重复数据并保留最新时间戳的记录 [英] Dedupe and retain record with most recent timestamp

查看:25
本文介绍了删除重复数据并保留最新时间戳的记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理一个包含 500 多列的宽数据集.该数据集包含一个客户 ID 字段和一个时间戳字段.我想查询数据并最终得到一个表,每个客户 ID 字段只有一行,其中保留的行是具有最新时间戳的行.如果这有所不同,该查询将在 Netezza 服务器上运行.似乎我可以用子查询来做到这一点,但我似乎无法获得有效的语法.

I'm working with a wide dataset with 500+ columns. The dataset contains a customer ID field and a time-stamp field. I'd like to query the data and end up with a table with only one row per customer ID field where the row retained is the row with the most recent timestamp. The query will be run on a Netezza server if that makes a difference. It seems like I could do this with a sub-query, but I can't seem to get syntax that works.

推荐答案

以下是解决此问题的典型方法:

Here is a typical way to approach this problem:

select t.*
from table t
where not exists (select 1
                  from table t2
                  where t2.customerid = t.customerid and
                        t2.timestamp > t.timestamp
                 );

这将问题重新表述为:获取表中没有具有相同客户 ID 和更大时间戳的行的所有行."

This rephrases the question to: "Get me all rows from the table where there is no row with the same customer id and a larger timestamp."

这篇关于删除重复数据并保留最新时间戳的记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆