如何在BigQuery中比较具有记录类型列的两个表 [英] How to compare two tables having record type column in BigQuery

查看:39
本文介绍了如何在BigQuery中比较具有记录类型列的两个表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个嵌套表,一个是源表,另一个是目标表。我想比较源表和目标表的嵌套列。我正在比较两个表,以检查源表中的天气数据是否正在更新。BigQuery中是否有SQL可以实现同样的功能?

以下是我以前比较具有嵌套记录的两个表的方法:

1.这是第一种方法:

SELECT to_json_string(info) FROM database.nested_table_source
except distinct
SELECT to_json_string(info) FROM nested_table_target

to_json_string()不起作用,因为此函数有时返回源行和目标行的不同序列,即使这两个表中的数据相同,它也会产生不同的记录。

2.这是第二种方法:

select name
from dataset.nested_table_source a
join dataset.nested_table_target b
using(name)
where
a.name!=b.name  and
(select string_agg(format('%t', s) order by key) from a.info s) 
!= (select string_agg(format('%t', s) order by key ) from b.info s)
在此方法中,我使用string_agg函数比较两个嵌套的记录。但我不确定这是否是比较记录字段的正确方式。

在这种情况下我应该怎么做?

推荐答案

这里是一种方法,在该方法中,您基本上将有序的对象集(或表中的info列)串化,然后将它们相互比较。

下面是一些虚拟数据的示例:

with source_data as (
    select 
    "VICTOR" as name,
     array[
        struct("A" as key, 3 as value),       
        struct("B" as key, 5 as value)   
    ] as info

    union all 
    
    select 
    "MAX" as name,
     array[
        struct("A" as key, 0 as value),       
        struct("B" as key, 1 as value)   
    ] as info

    union all 
    
    select 
    "SAIF" as name,
     array[
        struct("A" as key, 0 as value),       
        struct("B" as key, 1 as value)   
    ] as info
),
target_data as (
    select 
    "VICTOR" as name,
     array[
        struct("A" as key, 3 as value),       
        struct("B" as key, 15 as value)   
    ] as info

    union all 
    
    select 
    "MAX" as name,
     array[
        struct("A" as key, 0 as value),       
        struct("B" as key, 1 as value)   
    ] as info
)

select name, stringified_source_set as info from (
    select 
        s.name, 
        array_to_string(array(select concat(cast(x.key as string), '|', cast(x.value as string)) from unnest(t.info) as x order by x.key), '|') AS stringified_target_set,
        array_to_string(array(select concat(cast(x.key as string), '|', cast(x.value as string)) from unnest(s.info) as x order by x.key), '|') AS stringified_source_set
    from source_data s
    left join target_data t on t.name = s.name 
)
where (stringified_source_set != stringified_target_set) or (name is null)

请注意,以上方法确实同时实现了";横向比较和";(即,比较info对象)和";纵向比较(即,比较源表中存在的目标表中缺失的条目)。

这篇关于如何在BigQuery中比较具有记录类型列的两个表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆