BigQuery UPDATE嵌套数组字段 [英] BigQuery UPDATE nested array field
问题描述
我需要使用另一个表中的值更新一个表中的嵌套字段. 使用此解决方案,我想出了一些可行的方法,但与我的不完全相同想要它. 这是我的解决方案:
I need to update nested field in one table, using the value from another table. Using this solution i came up with something that works, but not exactly as i want it to. Here is my solution:
#standardSQL
UPDATE
`attribution.daily_sessions_20180301_copy1` AS target
SET
hits = ARRAY(
SELECT AS STRUCT * REPLACE(ARRAY(
SELECT AS STRUCT *
FROM(
SELECT AS STRUCT * REPLACE(map.category AS productCategoryAttribute) FROM UNNEST(product))) AS product) FROM UNNEST(hits)
)
FROM
`attribution.attribute_category_map`
AS map
WHERE
(
SELECT REPLACE(LOWER(prod.productCategory), 'amp;', '') FROM UNNEST(target.hits) AS h,
UNNEST(h.product) AS prod LIMIT 1) = map.raw_name
attribute_category_map是一个具有两列的表,我在其中查找第1列中的对应值,并用第2列中的值替换目标表中的数据.仅适用于第一个嵌套字段,而不用特定值更新每个嵌套字段.
attribute_category_map is a table with two columns where i look for corresponding value in column 1 and replace data in target table with value from column 2. Best result i achieved - updated all nested fields on one row with the same value, wich is only correct for the first nested field, instead of updating each nested field with specific value.
主表的简化模式:
[
{
"name":"sessionId",
"type":"STRING",
"mode":"NULLABLE"
},
{
"name":"hits",
"type":"RECORD",
"mode":"REPEATED",
"fields":[
{
"name":"product",
"type":"RECORD",
"mode":"REPEATED",
"fields":[
{
"name":"productCategory",
"type":"STRING",
"mode":"NULLABLE"
},
{
"name":"productCategoryAttribute",
"type":"STRING",
"mode":"NULLABLE"
}
]
}
]
}
]
会话行中通常有多个匹配,一个匹配中通常包含多个产品.值看起来像那些值(如果您嵌套的话):
There are usually several hits within session row and several products within one hit. With values looking like those (if you unnest):
-----------------------------------------------------------------------------
sessionId | hits.product.productCategory| hit.product.productCategoryAttribute
-----------------------------------------------------------------------------
1 | automotive chemicals | null
1 | automotive tools | null
1 | null | null
2 | null | null
2 | automotive chemicals | null
2 | null | null
3 | null | null
3 | bed accessories | null
4 | null | null
4 | null | null
4 | automotive chemicals | null
4 | null | null
-----------------------------------------------------------------------------
地图表的架构:
[
{
"name":"raw_name",
"type":"STRING",
"mode":"NULLABLE"
},
{
"name":"category",
"type":"STRING",
"mode":"NULLABLE"
}
]
具有这样的值:
---------------------------------------------------
raw_name |category |
---------------------------------------------------
automotive chemicals |d1y2 - automotive chemicals|
automotive paint |dijf1 - automotive paint |
automotive tools |efw1 - automotive tools |
baby & infant toys |wwfw - baby & infant toys |
batteries & power |fdsv- batteries & power |
bed accessories |0k77 - bed accessories |
bike racks |12df - bike racks |
--------------------------------------------------
结果是我想要的:
-----------------------------------------------------------------------------
sessionId | hits.product.productCategory| hit.product.productCategoryAttribute
-----------------------------------------------------------------------------
1 | automotive chemicals | d1y2 - automotive chemicals
1 | automotive tools | efw1 - automotive tools
1 | null | null
2 | null | null
2 | automotive chemicals | d1y2 - automotive chemicals
2 | null | null
3 | null | null
3 | bed accessories | 0k77 - bed accessories
4 | null | null
4 | null | null
4 | automotive chemicals | d1y2 - automotive chemicals
4 | null | null
-----------------------------------------------------------------------------
我需要从主表中获取值productCategory,在map表中的raw_name列中查找它,从colum类别中获取值,并将其放入主表的productCategoryAttribute列中.主要问题是目标字段是双重嵌套的,我不知道如何直接加入它们.
I need to take value productCategory from main table, look it up in map table in column raw_name, take value from colum category and put it to productCategoryAttribute column of main table. Main problem is target fields are double nested and i can't figure out how to join directly to them.
推荐答案
以下内容已通过测试!
保留整个表的架构/数据不变,并且仅根据各自的映射更新productCategoryAttribute的值
Below is tested!
Leaves whole table's schema/data as is and only updates values of productCategoryAttribute based on the respective mapping
#standardSQL
UPDATE `project.dataset.your_table` t
SET hits =
ARRAY(
SELECT AS STRUCT * REPLACE(
ARRAY(
SELECT AS STRUCT product.* REPLACE(
CASE WHEN map.raw_name = product.productCategory THEN category
ELSE productCategoryAttribute END AS productCategoryAttribute)
FROM UNNEST(product) product
LEFT JOIN UNNEST(agg_map.map) map
ON map.raw_name = product.productCategory
) AS product)
FROM UNNEST(hits) hit
)
FROM (SELECT ARRAY_AGG(row) map FROM `project.dataset.map` row) agg_map
WHERE TRUE
注意:上面的解决方案假定映射表不是那么大,因为它依赖于将整个映射表聚合到一个数组中
Note: above solution assumes that map table is not that big as it relies on aggregating whole map table into one array
这篇关于BigQuery UPDATE嵌套数组字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!