删除bigquery中的重复数据 [英] Remove Duplicate Data in bigquery
本文介绍了删除bigquery中的重复数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
抱歉,在这里有一个新手SQL问题,我有一个数据库,其中包含类似形式的数据
Sorry a new-bie SQL question here ,I have the a DB containing data in a similar form
U1 Program_Id A_ID Date
x1 A 1 1
x1 A 2 1
x1 A 3 1
x1 B 1 1
x1 B 2 1
x1 B 3 1
x2 B 1 1
x2 B 2 1
我想考虑所有四列来重复数据,这样我的最终输出将如下所示
I would like to de duplicate the data considering all the four columns ,such that my final output will be like below
U1 Program_Id A_ID Date
x1 A 1 1
x1 A 2 1
x1 A 3 1
x2 B 1 1
x2 B 2 1
对于同一日期,存在
ie X1 A
和 X1 B
,因此这些是需要删除的重复项,仅保留 X1 A
行, X1 B
行是此处的重复项.
ie X1 A
exists and X1 B
also exist ,for the same Date hence those are duplicates that need to be removed , keeping just X1 A
rows , X1 B
rows are the duplicates here.
推荐答案
以下是BigQuery标准SQL
Below is for BigQuery Standard SQL
#standardSQL
SELECT U1, MIN(Program_Id) Program_Id, A_ID, DATE
FROM `project.dataset.table`
GROUP BY U1, A_ID, DATE
您可以使用问题中的虚拟数据来测试/玩游戏
You can test / play with above using dummy data from your question
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'x1' U1, 'A' Program_Id, 1 A_ID, 1 DATE UNION ALL
SELECT 'x1', 'A', 2, 1 UNION ALL
SELECT 'x1', 'A', 3, 1 UNION ALL
SELECT 'x1', 'B', 1, 1 UNION ALL
SELECT 'x1', 'B', 2, 1 UNION ALL
SELECT 'x1', 'B', 3, 1 UNION ALL
SELECT 'x2', 'B', 1, 1 UNION ALL
SELECT 'x2', 'B', 2, 1
)
SELECT U1, MIN(Program_Id) Program_Id, A_ID, DATE
FROM `project.dataset.table`
GROUP BY U1, A_ID, DATE
ORDER BY DATE, U1, A_ID
结果是
U1 Program_Id A_ID DATE
x1 A 1 1
x1 A 2 1
x1 A 3 1
x2 B 1 1
x2 B 2 1
这篇关于删除bigquery中的重复数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文