在BigQuery中UNNESTING多个数组 [英] UNNESTING multiple arrays in BigQuery

查看：135 发布时间：2018/5/7 17:37:41 sql google-bigquery

本文介绍了在BigQuery中UNNESTING多个数组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在这个例子中，我有一本书数据库，每本书有一个记录。记录包含书籍所有者，流派和其他信息。我需要返回每个所有者，每个流派的前20位的样本以及该行中的所有数据。

我有这个代码，这是我需要的对于行中的一个数据点（Data_one）：

pre $ lt; code> WITH`project.dataset.table` AS（
SELECT
名称名称，
类型流派，
Data_one org
从`project.dataset.booktable`
），搜索AS（
SELECT name，genre FROM
UNNEST（['Alex'，'James']）名称，
UNNEST（['HORROR'，'COMEDY']）类型
）
SELECT name，genre，org
FROM（
SELECT t.name，t.genre，ARRAY_AGG（t.org LIMIT 20）orgs
FROM`project.dataset.table` t JOIN搜索s
ON LOWER （s.name）= LOWER（t.name）
AND LOWER（s.genre）= LOWER（t.genre）
WHERE RAND（）<0.5
GROUP BY t.name ，t.genre
），UNNEST（orgs）org
ORDER BY名称，流派，org

但是，当我试图扩展它的工作一秒钟（最终相当一些数据），它会使记录返回200倍：

  WITH`project.dataset .table` AS（
 SELECT 
名称名称，
类型流派，
 Data_one org，
 Data_two org2 
从`project.dataset.booktable` 
），搜索AS（
 SELECT name，genre FROM 
 UNNEST（['Alex'，'James']）name，
 UNNEST（['HORROR'，'COMEDY'] ）流派
）
选择名称，流派，org，org2 
 FROM（
 SELECT t.name，t.genre，ARRAY_AGG（t.org LIMIT 20）orgs，ARRAY_AGG（ t.org2 LIMIT 20）orgs2 
 FROM`project.dataset.table` t JOIN搜索s 
 ON LOWER（s.name）= LOWER（t.name）
 AND LOWER（s。流派）= LOWER（t.genre）
 WHERE RAND（）< 0.5 
 GROUP BY t.name，t.genre 
），UNNEST（orgs）org，UNNEST（orgs2）org2 
 ORDER BY名称，流派，org，org2

我知道UNNEST将一个数组转换为一个表，但是这是以某种方式创建一个数组的数组并将其解开？我不熟悉这个语法。编辑：我试图得到的数据全部在同一个级别上，所有单个数据点（没有数组）以及混合了NULLABLE STRINGS，INTEGERS，TIMESTAMPS，FLOATS

EG：

 类型STRING NULLABLE 
名称STRING NULLABLE 
 Data_one STRING NULLABLE 
 Data_two STRING NULLABLE 
 Data_three INTEGER NULLABLE 
 Data_four TIMESTAMP NULLABLE 
 
所有者|类型| Data_one | Data_two | Data_three | Data_four 
 Alex |恐怖|斯蒂芬金| IT | 3 | 2018-01-02 
 Alex |科幻| Andy Weir |火星人| 5 | 2018-01-02 
 James |恐怖|布拉姆斯托克|德古拉| 2 | 2018-01-02 
 Sarah |恐怖|斯蒂芬金| The Stand | 3 | 2018-01-02 
 James |恐怖|斯蒂芬金|宠物Sematary | 2 | 2018-01-02

解决方案

详细信息 - 下面的答案只是您探索的一个方向

  #standardSQL 
 SELECT name，genre，data_one，data_two FROM（
 SELECT t.name，t.genre，ARRAY_AGG（t.org LIMIT 20）orgs，ARRAY_AGG（t.org2 LIMIT 20）orgs2 
 FROM`project.dataset.table` t JOIN搜索s 
 ON LOWER（s.name）= LOWER（t.name）
 AND LOWER（s.genre）= LOWER（t.genre ）
 WHERE RAND（）<0.5 
 GROUP BY t.name，t.genre 
），UNNEST（orgs）data_one WITH OFFSET pos1 
，UNNEST（orgs2）data_two WITH OFFSET pos2 
 WHERE pos1 = pos2 
 ORDER BY name，genre，data_one 
  正如你所看到的 - 在这里，OFFSET被引入了识别阵列中元素的位置，然后只剩下那些具有相同位置的组合。 
 
在真实用例 - 你最有可能有另一个字段，用于标识data_one和data_two属于同一行，并且该字段可用于将这些data_one和data_two配对
 
 
 希望这有助于让你的方向
 
 更新
 
 
 
 
 < 
 
 $ b 
  #standardSQL 
 SELECT name ，类型，data.data_one，data.data_two，data.data_three，data.data_four 
 FROM（
 SELECT t.name，t.genre，
 ARRAY_AGG（STRUCT（data_one，data_two，data_three ，data_four）LIMIT 20）data 
 FROM`project.dataset.table` t JOIN搜索s 
 ON LOWER（s.name）= LOWER（t.name）
 AND LOWER（s。流派）= LOWER（t.genre）
 WHERE RAND（）< 0.5 
 GROUP BY t.name，t.genre 
），UNNEST（数据）数据
 ORDER BY名称，类型
  
这正是我在另一篇文章（）中对第一个相关问题的评论中提到的，您可以在其中使用org.data_one，org.data_two选择语句）
 
In this example, I have a book database, with one record per book. The records contain the book owners, the genre, and some other info. I need to return a sample of the top 20 per owner, per genre, along with all the data in the row.

I have this code, which does what I need for one data point in the row (Data_one): 
WITH `project.dataset.table` AS (
  SELECT 
    Name name, 
    Genre genre, 
    Data_one org
  FROM `project.dataset.booktable`
), search AS (
  SELECT name, genre FROM
  UNNEST(['Alex','James']) name, 
  UNNEST(['HORROR','COMEDY']) genre
)
SELECT name, genre, org 
FROM (
  SELECT t.name, t.genre, ARRAY_AGG(t.org LIMIT 20) orgs
  FROM `project.dataset.table` t JOIN search s 
  ON LOWER(s.name) = LOWER(t.name) 
  AND LOWER(s.genre) = LOWER(t.genre) 
  WHERE RAND() < 0.5
  GROUP BY t.name, t.genre
), UNNEST(orgs) org
ORDER BY name, genre, org
But when I try to extend it to work for a second (and eventually quite a few) piece of data from the row, it inflates the records returned by a factor of 200:
WITH `project.dataset.table` AS (
  SELECT 
    Name name, 
    Genre genre, 
    Data_one org,
    Data_two org2
  FROM `project.dataset.booktable`
), search AS (
  SELECT name, genre FROM
  UNNEST(['Alex','James']) name, 
  UNNEST(['HORROR','COMEDY']) genre
)
SELECT name, genre, org, org2 
FROM (
  SELECT t.name, t.genre, ARRAY_AGG(t.org LIMIT 20) orgs, ARRAY_AGG(t.org2 LIMIT 20) orgs2
  FROM `project.dataset.table` t JOIN search s 
  ON LOWER(s.name) = LOWER(t.name) 
  AND LOWER(s.genre) = LOWER(t.genre) 
  WHERE RAND() < 0.5
  GROUP BY t.name, t.genre
), UNNEST(orgs) org, UNNEST(orgs2) org2
ORDER BY name, genre, org, org2
I know UNNEST turns an array into a table, but is this somehow creating an array of an array and unnesting that? I am unfamiliar with the syntax.

Edit:
The data I am trying to get is all on the same level, all single data points (no arrays) and a mixture of NULLABLE STRINGS, INTEGERS, TIMESTAMPS, FLOATS

E.G:
Genre   STRING  NULLABLE
Name    STRING  NULLABLE    
Data_one    STRING  NULLABLE    
Data_two    STRING  NULLABLE    
Data_three  INTEGER NULLABLE    
Data_four   TIMESTAMP   NULLABLE    

Owner   |   Genre    |   Data_one    | Data_two   |Data_three|Data_four
Alex    |   Horror   |  Stephen King |    IT      |    3     |2018-01-02
Alex    |   Sci-fi   |   Andy Weir   |The Martian |    5     |2018-01-02
James   |   Horror   |  Bram Stoker  |   Dracula  |    2     |2018-01-02
Sarah   |   Horror   |  Stephen King | The Stand  |    3     |2018-01-02
James   |   Horror   |  Stephen King |Pet Sematary|    2     |2018-01-02

 解决方案 
as your question leaks details - below answer is just a direction for you to explore   
#standardSQL
SELECT name, genre, data_one, data_two FROM (
  SELECT t.name, t.genre, ARRAY_AGG(t.org LIMIT 20) orgs, ARRAY_AGG(t.org2 LIMIT 20) orgs2
  FROM `project.dataset.table` t JOIN search s 
  ON LOWER(s.name) = LOWER(t.name) 
  AND LOWER(s.genre) = LOWER(t.genre) 
  WHERE RAND() < 0.5
  GROUP BY t.name, t.genre
), UNNEST(orgs) data_one WITH OFFSET pos1
, UNNEST(orgs2) data_two WITH OFFSET pos2
WHERE pos1 = pos2
ORDER BY name, genre, data_one
As you can see - here OFFSET was introduced identifying position of elements within the array and then leaving in result only those combinations which have same positions     

In real use case - you most likely have some yet another field that identifies which data_one and data_two belong to the same row and that field can be used to pair those data_one and data_two   

Hope this helped to get you direction 

  Update   
as you added schema/example  - see below   
#standardSQL
SELECT name, genre, data.data_one, data.data_two, data.data_three, data.data_four 
FROM (
  SELECT t.name, t.genre, 
    ARRAY_AGG(STRUCT(data_one, data_two, data_three, data_four) LIMIT 20) data
  FROM `project.dataset.table` t JOIN search s 
  ON LOWER(s.name) = LOWER(t.name) 
  AND LOWER(s.genre) = LOWER(t.genre) 
  WHERE RAND() < 0.5
  GROUP BY t.name, t.genre
), UNNEST(data) data
ORDER BY name, genre
That is exactly what I mentioned in comments to your very first related question in another post  (you can just use org.data_one, org.data_two in you select statement)   

                        这篇关于在BigQuery中UNNESTING多个数组的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

在BigQuery中UNNESTING多个数组 [英] UNNESTING multiple arrays in BigQuery

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在BigQuery中UNNESTING多个数组 [英] UNNESTING multiple arrays in BigQuery

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭