如何使用BigQuery提取JSON对象中的所有键 [英] How to extract all the keys in a JSON object with BigQuery
问题描述
<但是,我找不到找到这些对象中所有键(属性)的方法。
我可以使用UDF吗?
修复了原始答案中的一些问题,例如:
1.只有第一级密钥被释放
2.必须手动comppile并运行最终查询以基于发现的信息进行提取键
$ p $ SELECT类型,键值,COUNT(1)AS权重
FROM JS(
SELECT json,输入
FROM [fh-bigquery:openlibrary.ol_dump_20151231@0]
WHERE type ='/ type / edition'
),
json,type,// Input
[{name:'type',type:'string'},//输出模式
{name:'key',type:'string'},
{name: 'value',type:'string'}],
function(r,emit){//函数
x = JSON.parse(r.json);
processKey(x ,'');
函数processKey(node,parent){
if(parent!==''){paren t + ='。'};
Object.keys(node).map(function(key){
value = node [key] .toString();
if(value!=='[object Object]'){
emit({type:r.type,key:parent + key,value:value});
} else {
processKey(node [key],parent + key);
};
});
};
}
)
GROUP EACH BY类型,键值
ORDER BY权重DESC
LIMIT 1000
结果如下
行类型键值重量
1 / type / edition type.key / type / edition 25140209
2 / type / edition last_modified.type / type / datetime 25140209
3 / type / edition created.type / type / datetime 17092292
4 / type / edition languages.0.key / languages / eng 14514830
5 / type / edition notes.type / type / text 11681480
6 / type / edition revision 2 8714084
7 / type / edition latest_revision 2 8704217
8 / type / edition revision 3 5041680
9 / type / edition latest_revi sion 3 5040634
10 / type / edition created.value 2008-04-01T03:28:50.625462 3579095
11 / type / edition revision 1 3396868
12 / type / edition physical_format平装3181270
13 / type / edition revision 4 3053266
14 / type / edition latest_revision 4 3053197
15 / type / edition revision 5 2076094
16 / type / edition latest_revision 5 2076072
17 / type / edition publish_country nyu 1727347
18 / type / edition created.value 2008-04-30T09:38:13.731961 1681227
19 / type / edition publish_country enk 1627969
20 / type / edition publish_places伦敦16137 55
21 / type / edition physical_format精装1495864
22 / type / edition publish_places纽约1467779
23 /类型/版本修订版6 1437467
24 /类型/版latest_revision 6 1437463
25 / type / edition publish_country xxk 1407624
BigQuery has facilities to parse JSON in real-time interactive queries: Just store the JSON encoded object as a string, and query in real time, with functions like JSON_EXTRACT_SCALAR.
However, I can't find a way to discover all the keys (properties) in these objects.
Can I use a UDF for this?
Below version fixes some "issues" in original answer like:
1. only first level of keys was emitted
2. having to manually comppile and than run final query for extracting info based on discovered keys
SELECT type, key, value, COUNT(1) AS weight
FROM JS(
(SELECT json, type
FROM [fh-bigquery:openlibrary.ol_dump_20151231@0]
WHERE type = '/type/edition'
),
json, type, // Input columns
"[{name: 'type', type:'string'}, // Output schema
{name: 'key', type:'string'},
{name: 'value', type:'string'}]",
"function(r, emit) { // The function
x = JSON.parse(r.json);
processKey(x, '');
function processKey(node, parent) {
if (parent !== '') {parent += '.'};
Object.keys(node).map(function(key) {
value = node[key].toString();
if (value !== '[object Object]') {
emit({type:r.type, key:parent + key, value:value});
} else {
processKey(node[key], parent + key);
};
});
};
}"
)
GROUP EACH BY type, key, value
ORDER BY weight DESC
LIMIT 1000
The result is as below
Row type key value weight
1 /type/edition type.key /type/edition 25140209
2 /type/edition last_modified.type /type/datetime 25140209
3 /type/edition created.type /type/datetime 17092292
4 /type/edition languages.0.key /languages/eng 14514830
5 /type/edition notes.type /type/text 11681480
6 /type/edition revision 2 8714084
7 /type/edition latest_revision 2 8704217
8 /type/edition revision 3 5041680
9 /type/edition latest_revision 3 5040634
10 /type/edition created.value 2008-04-01T03:28:50.625462 3579095
11 /type/edition revision 1 3396868
12 /type/edition physical_format Paperback 3181270
13 /type/edition revision 4 3053266
14 /type/edition latest_revision 4 3053197
15 /type/edition revision 5 2076094
16 /type/edition latest_revision 5 2076072
17 /type/edition publish_country nyu 1727347
18 /type/edition created.value 2008-04-30T09:38:13.731961 1681227
19 /type/edition publish_country enk 1627969
20 /type/edition publish_places London 1613755
21 /type/edition physical_format Hardcover 1495864
22 /type/edition publish_places New York 1467779
23 /type/edition revision 6 1437467
24 /type/edition latest_revision 6 1437463
25 /type/edition publish_country xxk 1407624
这篇关于如何使用BigQuery提取JSON对象中的所有键的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!