Bigquery - json_extract 从数组中提取所有元素 [英] Bigquery - json_extract all elements from an array

查看:70
本文介绍了Bigquery - json_extract 从数组中提取所有元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从一组 jsons 中的每个 json 中提取两个键(使用 sql legacy)目前我正在使用 json 提取功能:

json_extract(json_column , '$[1].X') AS X,json_extract(json_column, '$[1].Y') AS Y,

如何让它在json arry 列"中的每个 json 上运行,而不仅仅是 [1](例如)?

示例 json:

<预><代码>[{"blabla":000,"X":1,"blabla":000,"blabla":000,"blabla":000,,"Y":"2"},{"blabla":000,"X":3,"blabla":000,"blabla":000,"blabla":000,,"Y":"4"},]

提前致谢!

解决方案

2020 年更新:JSON_EXTRACT_ARRAY()

现在 BigQuery 支持 JSON_EXTRACT_ARRAY():

  • <小时>

    上一个回答

    让我们从一个类似的问题开始——这不是从 json 数组中提取所有电子邮件的一种非常方便的方法:

    SELECT id, [ JSON_EXTRACT_SCALAR(JSON_EXTRACT(payload, '$.commits'), '$[0].author.email'), JSON_EXTRACT_SCALAR(JSON_EXTRACT(payload, '$.commits'), '$[1].author.email'), JSON_EXTRACT_SCALAR(JSON_EXTRACT(payload, '$.commits'), '$[2].author.email'), JSON_EXTRACT_SCALAR(JSON_EXTRACT(payload, '$.commits'), '$[3].author.email')] 电子邮件来自`githubarchive.day.20180830`WHERE type='PushEvent'和 id='8188163772'

    我们现在处理这个问题的最好方法是在 UDF 中使用一些 JavaScript 将 json 数组拆分为 SQL 数组:

    CREATE TEMP FUNCTION json2array(json STRING)返回数组语言 js 为 """返回 JSON.parse(json).map(x=>JSON.stringify(x));""";SELECT * EXCEPT(array_commits),ARRAY(SELECT JSON_EXTRACT_SCALAR(x, '$.author.email') FROM UNNEST(array_commits) x) 电子邮件从 (选择 ID, json2array(JSON_EXTRACT(payload, '$.commits')) array_commits来自`githubarchive.day.20180830`WHERE type='PushEvent'和 id='8188163772')

    i'm trying to extract two key from every json in an arry of jsons(using sql legacy) currently i am using json extract function :

    json_extract(json_column , '$[1].X') AS X,
    json_extract(json_column , '$[1].Y') AS Y,
    

    how can i make it run on every json at the 'json arry column', and not just [1] (for example)?

    An example json:

    [
    
    {"blabla":000,"X":1,"blabla":000,"blabla":000,"blabla":000,,"Y":"2"},
    
    {"blabla":000,"X":3,"blabla":000,"blabla":000,"blabla":000,,"Y":"4"},
    
    ]   
    

    thanks in advance!

    解决方案

    Update 2020: JSON_EXTRACT_ARRAY()

    Now BigQuery supports JSON_EXTRACT_ARRAY():

    For example, to solve this particular question:

    SELECT id
      , ARRAY(
          SELECT JSON_EXTRACT_SCALAR(x, '$.author.email') 
          FROM UNNEST(JSON_EXTRACT_ARRAY(payload, "$.commits"))x
      ) emails
    FROM `githubarchive.day.20180830` 
    WHERE type='PushEvent' 
    AND id='8188163772'
    


    Previous answer

    Let's start with a similar problem - this is not a very convenient way to extract all emails from a json array:

    SELECT id
      , [ JSON_EXTRACT_SCALAR(JSON_EXTRACT(payload, '$.commits'), '$[0].author.email')  
          , JSON_EXTRACT_SCALAR(JSON_EXTRACT(payload, '$.commits'), '$[1].author.email')  
          , JSON_EXTRACT_SCALAR(JSON_EXTRACT(payload, '$.commits'), '$[2].author.email')  
          , JSON_EXTRACT_SCALAR(JSON_EXTRACT(payload, '$.commits'), '$[3].author.email')
        ] emails
    FROM `githubarchive.day.20180830` 
    WHERE type='PushEvent' 
    AND id='8188163772'
    

    The best way we have right now to deal with this is to use some JavaScript in an UDF to split a json-array into a SQL array:

    CREATE TEMP FUNCTION json2array(json STRING)
    RETURNS ARRAY<STRING>
    LANGUAGE js AS """
      return JSON.parse(json).map(x=>JSON.stringify(x));
    """; 
    
    SELECT * EXCEPT(array_commits),
      ARRAY(SELECT JSON_EXTRACT_SCALAR(x, '$.author.email') FROM UNNEST(array_commits) x) emails
    FROM (
      SELECT id
        , json2array(JSON_EXTRACT(payload, '$.commits')) array_commits
      FROM `githubarchive.day.20180830` 
      WHERE type='PushEvent' 
      AND id='8188163772'
    )
    

    这篇关于Bigquery - json_extract 从数组中提取所有元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆