如何使用BigQuery提取JSON对象中的所有键 [英] How to extract all the keys in a JSON object with BigQuery

查看:309
本文介绍了如何使用BigQuery提取JSON对象中的所有键的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

BigQuery具有在实时交互式查询中解析JSON的功能:只需将JSON编码对象作为字符串存储,并使用JSON_EXTRACT_SCALAR等函数实时查询。



<但是,我找不到找到这些对象中所有键(属性)的方法。



我可以使用UDF吗?

解决方案

修复了原始答案中的一些问题,例如:

1.只有第一级密钥被释放

2.必须手动comppile并运行最终查询以基于发现的信息进行提取键

$ p $ SELECT类型,键值,COUNT(1)AS权重
FROM JS(
SELECT json,输入
FROM [fh-bigquery:openlibrary.ol_dump_20151231@0]
WHERE type ='/ type / edition'
),
json,type,// Input
[{name:'type',type:'string'},//输出模式
{name:'key',type:'string'},
{name: 'value',type:'string'}],
function(r,emit){//函数
x = JSON.parse(r.json);
processKey(x ,'');
函数processKey(node,parent){
if(parent!==''){paren t + ='。'};
Object.keys(node).map(function(key){
value = node [key] .toString();
if(value!=='[object Object]'){
emit({type:r.type,key:parent + key,value:value});
} else {
processKey(node [key],parent + key);
};
});
};
}

GROUP EACH BY类型,键值
ORDER BY权重DESC
LIMIT 1000

结果如下

 行类型键值重量
1 / type / edition type.key / type / edition 25140209
2 / type / edition last_modified.type / type / datetime 25140209
3 / type / edition created.type / type / datetime 17092292
4 / type / edition languages.0.key / languages / eng 14514830
5 / type / edition notes.type / type / text 11681480
6 / type / edition revision 2 8714084
7 / type / edition latest_revision 2 8704217
8 / type / edition revision 3 5041680
9 / type / edition latest_revi sion 3 5040634
10 / type / edition created.value 2008-04-01T03:28:50.625462 3579095
11 / type / edition revision 1 3396868
12 / type / edition physical_format平装3181270
13 / type / edition revision 4 3053266
14 / type / edition latest_revision 4 3053197
15 / type / edition revision 5 2076094
16 / type / edition latest_revision 5 2076072
17 / type / edition publish_country nyu 1727347
18 / type / edition created.value 2008-04-30T09:38:13.731961 1681227
19 / type / edition publish_country enk 1627969
20 / type / edition publish_places伦敦16137 55
21 / type / edition physical_format精装1495864
22 / type / edition publish_places纽约1467779
23 /类型/版本修订版6 1437467
24 /类型/版latest_revision 6 1437463
25 / type / edition publish_country xxk 1407624


BigQuery has facilities to parse JSON in real-time interactive queries: Just store the JSON encoded object as a string, and query in real time, with functions like JSON_EXTRACT_SCALAR.

However, I can't find a way to discover all the keys (properties) in these objects.

Can I use a UDF for this?

解决方案

Below version fixes some "issues" in original answer like:
1. only first level of keys was emitted
2. having to manually comppile and than run final query for extracting info based on discovered keys

SELECT type, key, value, COUNT(1) AS weight 
FROM JS(
  (SELECT json, type 
     FROM [fh-bigquery:openlibrary.ol_dump_20151231@0] 
     WHERE type = '/type/edition'
  ),
  json, type,                             // Input columns
  "[{name: 'type', type:'string'},        // Output schema
   {name: 'key', type:'string'},
   {name: 'value', type:'string'}]",
   "function(r, emit) {                    // The function
      x = JSON.parse(r.json);
      processKey(x, '');
      function processKey(node, parent) {
        if (parent !== '') {parent += '.'};
        Object.keys(node).map(function(key) {
          value = node[key].toString();
          if (value !== '[object Object]') {
            emit({type:r.type, key:parent + key, value:value});
          } else {
            processKey(node[key], parent + key);
          };
        });         
      };
    }"
  )
GROUP EACH BY type, key, value
ORDER BY weight DESC
LIMIT 1000

The result is as below

Row          type   key                 value                         weight     
1   /type/edition   type.key            /type/edition               25140209     
2   /type/edition   last_modified.type  /type/datetime              25140209     
3   /type/edition   created.type        /type/datetime              17092292     
4   /type/edition   languages.0.key     /languages/eng              14514830     
5   /type/edition   notes.type          /type/text                  11681480     
6   /type/edition   revision            2                            8714084     
7   /type/edition   latest_revision     2                            8704217     
8   /type/edition   revision            3                            5041680     
9   /type/edition   latest_revision     3                            5040634     
10  /type/edition   created.value       2008-04-01T03:28:50.625462   3579095     
11  /type/edition   revision            1                            3396868     
12  /type/edition   physical_format     Paperback                    3181270     
13  /type/edition   revision            4                            3053266     
14  /type/edition   latest_revision     4                            3053197     
15  /type/edition   revision            5                            2076094     
16  /type/edition   latest_revision     5                            2076072     
17  /type/edition   publish_country     nyu                          1727347     
18  /type/edition   created.value       2008-04-30T09:38:13.731961   1681227     
19  /type/edition   publish_country     enk                          1627969     
20  /type/edition   publish_places      London                       1613755     
21  /type/edition   physical_format     Hardcover                    1495864     
22  /type/edition   publish_places      New York                     1467779     
23  /type/edition   revision            6                            1437467     
24  /type/edition   latest_revision     6                            1437463     
25  /type/edition   publish_country     xxk                          1407624 

这篇关于如何使用BigQuery提取JSON对象中的所有键的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆