BigQuery UPDATE嵌套数组字段 [英] BigQuery UPDATE nested array field

查看:58
本文介绍了BigQuery UPDATE嵌套数组字段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要使用另一个表中的值更新一个表中的嵌套字段. 使用此解决方案,我想出了一些可行的方法,但与我的不完全相同想要它. 这是我的解决方案:

I need to update nested field in one table, using the value from another table. Using this solution i came up with something that works, but not exactly as i want it to. Here is my solution:

#standardSQL
UPDATE
  `attribution.daily_sessions_20180301_copy1` AS target
SET
hits = ARRAY(
  SELECT AS STRUCT * REPLACE(ARRAY(
    SELECT AS STRUCT *
    FROM(
      SELECT AS STRUCT * REPLACE(map.category AS productCategoryAttribute) FROM UNNEST(product))) AS product) FROM UNNEST(hits)
)
FROM
  `attribution.attribute_category_map`
AS map
WHERE
  (
    SELECT REPLACE(LOWER(prod.productCategory), 'amp;', '') FROM UNNEST(target.hits) AS h,
    UNNEST(h.product) AS prod LIMIT 1) = map.raw_name

attribute_category_map是一个具有两列的表,我在其中查找第1列中的对应值,并用第2列中的值替换目标表中的数据.仅适用于第一个嵌套字段,而不用特定值更新每个嵌套字段.

attribute_category_map is a table with two columns where i look for corresponding value in column 1 and replace data in target table with value from column 2. Best result i achieved - updated all nested fields on one row with the same value, wich is only correct for the first nested field, instead of updating each nested field with specific value.

主表的简化模式:

[  
   {  
      "name":"sessionId",
      "type":"STRING",
      "mode":"NULLABLE"
   },
   {  
      "name":"hits",
      "type":"RECORD",
      "mode":"REPEATED",
      "fields":[  
         {  
            "name":"product",
            "type":"RECORD",
            "mode":"REPEATED",
            "fields":[  
               {  
                  "name":"productCategory",
                  "type":"STRING",
                  "mode":"NULLABLE"
               },
               {  
                  "name":"productCategoryAttribute",
                  "type":"STRING",
                  "mode":"NULLABLE"
               }
            ]
         }
      ]
   }
]

会话行中通常有多个匹配,一个匹配中通常包含多个产品.值看起来像那些值(如果您嵌套的话):

There are usually several hits within session row and several products within one hit. With values looking like those (if you unnest):

-----------------------------------------------------------------------------
sessionId | hits.product.productCategory| hit.product.productCategoryAttribute
-----------------------------------------------------------------------------
1         | automotive chemicals        | null
1         | automotive tools            | null
1         | null                        | null
2         | null                        | null
2         | automotive chemicals        | null
2         | null                        | null
3         | null                        | null
3         | bed accessories             | null
4         | null                        | null
4         | null                        | null
4         | automotive chemicals        | null
4         | null                        | null
-----------------------------------------------------------------------------

地图表的架构:

[  
   {  
      "name":"raw_name",
      "type":"STRING",
      "mode":"NULLABLE"
   },
   {  
      "name":"category",
      "type":"STRING",
      "mode":"NULLABLE"
   }
]

具有这样的值:

---------------------------------------------------
raw_name              |category                   |
---------------------------------------------------
automotive chemicals  |d1y2 - automotive chemicals|
automotive paint      |dijf1 - automotive paint   |
automotive tools      |efw1 - automotive tools    |
baby & infant toys    |wwfw - baby & infant toys  |
batteries & power     |fdsv- batteries & power    |
bed accessories       |0k77 - bed accessories     |
bike racks            |12df - bike racks          |
--------------------------------------------------

结果是我想要的:

-----------------------------------------------------------------------------
    sessionId | hits.product.productCategory| hit.product.productCategoryAttribute
-----------------------------------------------------------------------------
    1         | automotive chemicals        | d1y2 - automotive chemicals
    1         | automotive tools            | efw1 - automotive tools
    1         | null                        | null
    2         | null                        | null
    2         | automotive chemicals        | d1y2 - automotive chemicals
    2         | null                        | null
    3         | null                        | null
    3         | bed accessories             | 0k77 - bed accessories
    4         | null                        | null
    4         | null                        | null
    4         | automotive chemicals        | d1y2 - automotive chemicals
    4         | null                        | null
    -----------------------------------------------------------------------------

我需要从主表中获取值productCategory,在map表中的raw_name列中查找它,从colum类别中获取值,并将其放入主表的productCategoryAttribute列中.主要问题是目标字段是双重嵌套的,我不知道如何直接加入它们.

I need to take value productCategory from main table, look it up in map table in column raw_name, take value from colum category and put it to productCategoryAttribute column of main table. Main problem is target fields are double nested and i can't figure out how to join directly to them.

推荐答案

以下内容已通过测试!
保留整个表的架构/数据不变,并且仅根据各自的映射更新productCategoryAttribute的值

Below is tested!
Leaves whole table's schema/data as is and only updates values of productCategoryAttribute based on the respective mapping

#standardSQL
UPDATE `project.dataset.your_table` t
SET hits = 
  ARRAY(
    SELECT AS STRUCT * REPLACE(
      ARRAY(
        SELECT AS STRUCT product.* REPLACE(
          CASE WHEN map.raw_name = product.productCategory THEN category 
            ELSE productCategoryAttribute END AS productCategoryAttribute)
        FROM UNNEST(product) product
        LEFT JOIN UNNEST(agg_map.map) map 
        ON map.raw_name = product.productCategory
      ) AS product)
    FROM UNNEST(hits) hit
  ) 
FROM (SELECT ARRAY_AGG(row) map FROM `project.dataset.map` row) agg_map 
WHERE TRUE   

注意:上面的解决方案假定映射表不是那么大,因为它依赖于将整个映射表聚合到一个数组中

Note: above solution assumes that map table is not that big as it relies on aggregating whole map table into one array

这篇关于BigQuery UPDATE嵌套数组字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆