BigQuery UPDATE 嵌套数组字段 [英] BigQuery UPDATE nested array field

查看:23
本文介绍了BigQuery UPDATE 嵌套数组字段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要使用另一个表中的值更新一个表中的嵌套字段.使用 这个解决方案,我想出了一些有效的方法,但与我不完全一样想要它.这是我的解决方案:

#standardSQL更新`attribution.daily_sessions_20180301_copy1` AS 目标放点击数 = 数组(选择为结构 * 替换(数组(选择为结构 *从(SELECT AS STRUCT * REPLACE(map.category AS productCategoryAttribute) FROM UNNEST(product))) AS product) FROM UNNEST(hits))从`attribution.attribute_category_map`AS地图在哪里(SELECT REPLACE(LOWER(prod.productCategory), 'amp;', '') FROM UNNEST(target.hits) AS h,UNNEST(h.product) AS prod LIMIT 1) = map.raw_name

attribute_category_map 是一个包含两列的表,我在其中查找第 1 列中的相应值,并用第 2 列中的值替换目标表中的数据.我取得的最佳结果 - 用相同的值更新了一行上的所有嵌套字段,至多是仅对第一个嵌套字段进行修正,而不是使用特定值更新每个嵌套字段.

主表的简化模式:

<预><代码>[{"name":"sessionId","type":"STRING",模式":NULLABLE"},{"name":"命中","类型":"记录","模式":"重复",领域":[{"名称":"产品","类型":"记录","模式":"重复",领域":[{"name":"产品类别","type":"STRING",模式":NULLABLE"},{"name":"productCategoryAttribute","type":"STRING",模式":NULLABLE"}]}]}]

会话行中通常有多个点击,一个点击中有多个产品.值看起来像那些(如果你取消嵌套):

-----------------------------------------------------------------------------会话 ID |hits.product.productCategory|hit.product.productCategoryAttribute-----------------------------------------------------------------------------1 |汽车化学品|空值1 |汽车工具|空值1 |空|空值2 |空|空值2 |汽车化学品|空值2 |空|空值3 |空|空值3 |床具|空值4 |空|空值4 |空|空值4 |汽车化学品|空值4 |空|空值-----------------------------------------------------------------------------

映射表的架构:

<预><代码>[{"name":"raw_name","type":"STRING",模式":NULLABLE"},{"名称":"类别","type":"STRING",模式":NULLABLE"}]

具有如下值:

---------------------------------------------------raw_name |类别 |--------------------------------------------------——汽车化学品|d1y2 - 汽车化学品|汽车漆|dijf1 - 汽车漆|汽车工具|efw1 - 汽车工具| 家企业宝贝&婴儿玩具|wwfw - baby &婴儿玩具|电池&电源 |fdsv- 电池和电源 |床配件|0k77 - 床配件|自行车架|12df - 自行车架|--------------------------------------------------

我想要的结果:

-----------------------------------------------------------------------------会话 ID |hits.product.productCategory|hit.product.productCategoryAttribute-----------------------------------------------------------------------------1 |汽车化学品|d1y2 - 汽车化学品1 |汽车工具|efw1 - 汽车工具1 |空|空值2 |空|空值2 |汽车化学品|d1y2 - 汽车化学品2 |空|空值3 |空|空值3 |床具|0k77 - 床具配件4 |空|空值4 |空|空值4 |汽车化学品|d1y2 - 汽车化学品4 |空|空值-----------------------------------------------------------------------------

我需要从主表中获取productCategory 的值,在raw_name 列的映射表中查找它,从colum category 中获取值并将其放入主表的productCategoryAttribute 列中.主要问题是目标字段是双重嵌套的,我不知道如何直接加入它们.

解决方案

以下已测试!
保持整个表的架构/数据不变,仅根据相应的映射更新 productCategoryAttribute 的值

#standardSQL更新`project.dataset.your_table` tSET 命中 =大批(选择为结构 * 替换(大批(选择作为结构产品.* 替换(CASE WHEN map.raw_name = product.productCategory THEN categoryELSE productCategoryAttribute END AS productCategoryAttribute)来自 UNNEST(product) 产品左连接 UNNEST(agg_map.map) 地图ON map.raw_name = product.productCategory) 作为产品)FROM UNNEST(hits) 命中)FROM (SELECT ARRAY_AGG(row) map FROM `project.dataset.map` row) agg_map哪里是真的

注意:上面的解决方案假设地图表不是那么大,因为它依赖于将整个地图表聚合成一个数组

I need to update nested field in one table, using the value from another table. Using this solution i came up with something that works, but not exactly as i want it to. Here is my solution:

#standardSQL
UPDATE
  `attribution.daily_sessions_20180301_copy1` AS target
SET
hits = ARRAY(
  SELECT AS STRUCT * REPLACE(ARRAY(
    SELECT AS STRUCT *
    FROM(
      SELECT AS STRUCT * REPLACE(map.category AS productCategoryAttribute) FROM UNNEST(product))) AS product) FROM UNNEST(hits)
)
FROM
  `attribution.attribute_category_map`
AS map
WHERE
  (
    SELECT REPLACE(LOWER(prod.productCategory), 'amp;', '') FROM UNNEST(target.hits) AS h,
    UNNEST(h.product) AS prod LIMIT 1) = map.raw_name

attribute_category_map is a table with two columns where i look for corresponding value in column 1 and replace data in target table with value from column 2. Best result i achieved - updated all nested fields on one row with the same value, wich is only correct for the first nested field, instead of updating each nested field with specific value.

Simplified schema of the main table:

[  
   {  
      "name":"sessionId",
      "type":"STRING",
      "mode":"NULLABLE"
   },
   {  
      "name":"hits",
      "type":"RECORD",
      "mode":"REPEATED",
      "fields":[  
         {  
            "name":"product",
            "type":"RECORD",
            "mode":"REPEATED",
            "fields":[  
               {  
                  "name":"productCategory",
                  "type":"STRING",
                  "mode":"NULLABLE"
               },
               {  
                  "name":"productCategoryAttribute",
                  "type":"STRING",
                  "mode":"NULLABLE"
               }
            ]
         }
      ]
   }
]

There are usually several hits within session row and several products within one hit. With values looking like those (if you unnest):

-----------------------------------------------------------------------------
sessionId | hits.product.productCategory| hit.product.productCategoryAttribute
-----------------------------------------------------------------------------
1         | automotive chemicals        | null
1         | automotive tools            | null
1         | null                        | null
2         | null                        | null
2         | automotive chemicals        | null
2         | null                        | null
3         | null                        | null
3         | bed accessories             | null
4         | null                        | null
4         | null                        | null
4         | automotive chemicals        | null
4         | null                        | null
-----------------------------------------------------------------------------

Schema of the map table:

[  
   {  
      "name":"raw_name",
      "type":"STRING",
      "mode":"NULLABLE"
   },
   {  
      "name":"category",
      "type":"STRING",
      "mode":"NULLABLE"
   }
]

with values like this:

---------------------------------------------------
raw_name              |category                   |
---------------------------------------------------
automotive chemicals  |d1y2 - automotive chemicals|
automotive paint      |dijf1 - automotive paint   |
automotive tools      |efw1 - automotive tools    |
baby & infant toys    |wwfw - baby & infant toys  |
batteries & power     |fdsv- batteries & power    |
bed accessories       |0k77 - bed accessories     |
bike racks            |12df - bike racks          |
--------------------------------------------------

What i want as a result:

-----------------------------------------------------------------------------
    sessionId | hits.product.productCategory| hit.product.productCategoryAttribute
-----------------------------------------------------------------------------
    1         | automotive chemicals        | d1y2 - automotive chemicals
    1         | automotive tools            | efw1 - automotive tools
    1         | null                        | null
    2         | null                        | null
    2         | automotive chemicals        | d1y2 - automotive chemicals
    2         | null                        | null
    3         | null                        | null
    3         | bed accessories             | 0k77 - bed accessories
    4         | null                        | null
    4         | null                        | null
    4         | automotive chemicals        | d1y2 - automotive chemicals
    4         | null                        | null
    -----------------------------------------------------------------------------

I need to take value productCategory from main table, look it up in map table in column raw_name, take value from colum category and put it to productCategoryAttribute column of main table. Main problem is target fields are double nested and i can't figure out how to join directly to them.

解决方案

Below is tested!
Leaves whole table's schema/data as is and only updates values of productCategoryAttribute based on the respective mapping

#standardSQL
UPDATE `project.dataset.your_table` t
SET hits = 
  ARRAY(
    SELECT AS STRUCT * REPLACE(
      ARRAY(
        SELECT AS STRUCT product.* REPLACE(
          CASE WHEN map.raw_name = product.productCategory THEN category 
            ELSE productCategoryAttribute END AS productCategoryAttribute)
        FROM UNNEST(product) product
        LEFT JOIN UNNEST(agg_map.map) map 
        ON map.raw_name = product.productCategory
      ) AS product)
    FROM UNNEST(hits) hit
  ) 
FROM (SELECT ARRAY_AGG(row) map FROM `project.dataset.map` row) agg_map 
WHERE TRUE   

Note: above solution assumes that map table is not that big as it relies on aggregating whole map table into one array

这篇关于BigQuery UPDATE 嵌套数组字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆