BigQuery:查找类型为RECORD的ID的查找数组,并使用SQL从辅助表联接数据 [英] BigQuery: Lookup array of ids type RECORD and join data from secondary table using SQL

查看:119
本文介绍了BigQuery:查找类型为RECORD的ID的查找数组,并使用SQL从辅助表联接数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据结构如下:

产品

| name  | region_ids             |
----------------------------------
| shoe  | c32, a43, x53          |
| hat   | c32, f42               |

# Schema
name                STRING  NULLABLE
region_ids          RECORD  REPEATED    
region_ids.value    STRING  NULLABLE    

地区

| _id |  name       |
---------------------
| c32 |  london     |
| a43 |  manchester |
| x53 |  bristol    |
| f42 |  liverpool  |

# Schema
_id                 STRING  NULLABLE
name                STRING  NULLABLE

我要查找"region_ids"数组并用区域名称替换它们,以产生如下表:

I want to look up the array of "region_ids" and replace them by the region name to result in a table like below:

| _id |  name  | region_names                |
----------------------------------------------
| d22 |  shoe  | london, manchester, bristol |
| t64 |  hat   | london, liverpool           |

使用标准SQL做到这一点的最佳方法是什么?

What is the best way to do this using standard SQL?

谢谢

A

推荐答案

下面是BigQuery标准SQL

Below is for BigQuery Standard SQL

#standardSQL
SELECT p._id, p.name, 
  STRING_AGG(r.name, ', ' ORDER BY OFFSET) AS region_names 
FROM `project.dataset.Products` p,
UNNEST(region_ids) WITH OFFSET
LEFT JOIN `project.dataset.Regions` r
ON value = r._id
GROUP BY _id, name   

您可以使用问题中的示例数据来测试,玩转上面的示例

You can test, play with above using sample data from your question as in below example

#standardSQL
WITH `project.dataset.Products` AS (
  SELECT 'd22' _id, 'shoe' name, [STRUCT<value STRING>('c32'), STRUCT('a43'), STRUCT('x53')] region_ids UNION ALL
  SELECT 't64', 'hat', [STRUCT<value STRING>('c32'), STRUCT('f42')]
), `project.dataset.Regions` AS (
  SELECT 'c32' _id, 'london' name UNION ALL
  SELECT 'a43', 'manchester' UNION ALL
  SELECT 'x53', 'bristol' UNION ALL
  SELECT 'f42', 'liverpool' 
)
SELECT p._id, p.name, 
  STRING_AGG(r.name, ', ' ORDER BY OFFSET) AS region_names 
FROM `project.dataset.Products` p,
UNNEST(region_ids) WITH OFFSET
LEFT JOIN `project.dataset.Regions` r
ON value = r._id
GROUP BY _id, name   

结果是

Row _id name    region_names     
1   d22 shoe    london, manchester, bristol  
2   t64 hat     london, liverpool      

根据您问题中的输出示例-您希望region_names作为带有逗号分隔名称列表的字符串
但是,如果需要region_names作为数组-您可以将STRING_AGG(r.name, ', ' ORDER BY OFFSET)替换为ARRAY_AGG(r.name ORDER BY OFFSET)

Based on output example in your question - you expect region_names as string with list of comma separated names
But, if you need region_names as an array - you can replace STRING_AGG(r.name, ', ' ORDER BY OFFSET) with ARRAY_AGG(r.name ORDER BY OFFSET)

这篇关于BigQuery:查找类型为RECORD的ID的查找数组,并使用SQL从辅助表联接数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆