如何在 Presto 中进行重复数据删除 [英] How to deduplicate in Presto

查看：52 发布时间：2021/6/21 18:38:14 sql presto

本文介绍了如何在 Presto 中进行重复数据删除的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个 Presto 表，假设它有 [id, name, update_time] 列和数据

(1, Amy, 2018-08-01),(1, 艾米, 2018-08-02),(1, Amyyyyyyy, 2018-08-03),(2, 鲍勃, 2018-08-01)

现在，我想执行一个sql，结果是

(1, Amyyyyyyy, 2018-08-03),(2, 鲍勃, 2018-08-01)

目前，我在 Presto 中进行重复数据删除的最佳方法如下.

选择t1.id,t1.name,t1.update_time从表名 t1加入(选择 id, max(update_time) as update_time from table_name group by id)t2在 t1.id = t2.id 和 t1.update_time = t2.update_time

更多信息，例如解决方案

在 PrestoDB 中，我倾向于使用 row_number():

选择id、姓名、日期从(选择 t.*,row_number() over (partition by name order by date desc) as seqnum从 table_name t) t其中seqnum = 1;

I have a Presto table assume it has [id, name, update_time] columns and data

(1, Amy, 2018-08-01),
(1, Amy, 2018-08-02),
(1, Amyyyyyyy, 2018-08-03),
(2, Bob, 2018-08-01)

Now, I want to execute a sql and the result will be

(1, Amyyyyyyy, 2018-08-03),
(2, Bob, 2018-08-01)

Currently, my best way to deduplicate in Presto is below.

select 
    t1.id, 
    t1.name,
    t1.update_time 
from table_name t1
join (select id, max(update_time) as update_time from table_name group by id) t2
    on t1.id = t2.id and t1.update_time = t2.update_time

More information, clike deduplication in sql

Is there a better way to deduplicate in Presto?

解决方案

In PrestoDB, I would be inclined to use row_number():

select id, name, date
from (select t.*,
             row_number() over (partition by name order by date desc) as seqnum
      from table_name t
     ) t
where seqnum = 1;

这篇关于如何在 Presto 中进行重复数据删除的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在 Presto 中进行重复数据删除 [英] How to deduplicate in Presto

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在 Presto 中进行重复数据删除 [英] How to deduplicate in Presto

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭