从 BigQuery 表中删除重复行 [英] Delete duplicate rows from a BigQuery table

查看：44 发布时间：2021/12/30 22:36:37 distinct google-bigquery

本文介绍了从 BigQuery 表中删除重复行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个包含超过 100 万行数据和 20 多列的表格.

I have a table with >1M rows of data and 20+ columns.

在我的表 (tableX) 中，我在一个特定的列 (troubleColumn) 中发现了重复的记录 (~80k).

Within my table (tableX) I have identified duplicate records (~80k) in one particular column (troubleColumn).

如果可能，我想保留原始表名并从有问题的列中删除重复记录，否则我可以创建一个具有相同架构但没有重复项的新表 (tableXfinal).

If possible I would like to retain the original table name and remove the duplicate records from my problematic column otherwise I could create a new table (tableXfinal) with the same schema but without the duplicates.

我不精通 SQL 或任何其他编程语言，所以请原谅我的无知.

I am not proficient in SQL or any other programming language so please excuse my ignorance.

delete from Accidents.CleanedFilledCombined 
where Fixed_Accident_Index 
in(select Fixed_Accident_Index from Accidents.CleanedFilledCombined 
group by Fixed_Accident_Index 
having count(Fixed_Accident_Index) >1);

推荐答案

您可以通过运行重写您的表的查询来删除重复项(您可以使用与目标相同的表，或者您可以创建一个新表，验证它有你想要的，然后复制到旧表上).

You can remove duplicates by running a query that rewrites your table (you can use the same table as the destination, or you can create a new table, verify that it has what you want, and then copy it over the old table).

一个应该有效的查询在这里:

A query that should work is here:

SELECT *
FROM (
  SELECT
      *,
      ROW_NUMBER()
          OVER (PARTITION BY Fixed_Accident_Index)
          row_number
  FROM Accidents.CleanedFilledCombined
)
WHERE row_number = 1

这篇关于从 BigQuery 表中删除重复行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从 BigQuery 表中删除重复行 [英] Delete duplicate rows from a BigQuery table

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

从 BigQuery 表中删除重复行 [英] Delete duplicate rows from a BigQuery table

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭