使用 jq 展平嵌套的 JSON [英] Flatten nested JSON using jq
问题描述
我想展平一个嵌套的 json 对象,例如{"a":{"b":1}}
到 {"a.b":1}
以便在 solr 中消化它.
I'd like to flatten a nested json object, e.g. {"a":{"b":1}}
to {"a.b":1}
in order to digest it in solr.
我有 11 TB 的 json 文件,它们都是嵌套的并且在字段名称中包含点,这意味着不是 elasticsearch(点)也不是 solr(没有 _childDocument_
符号嵌套)可以按原样消化它.
I have 11 TB of json files which are both nested and contains dots in field names, meaning not elasticsearch (dots) nor solr (nested without the _childDocument_
notation) can digest it as is.
其他解决方案是用下划线替换字段名称中的点并将其推送到elasticsearch,但我对solr的体验要好得多,因此我更喜欢flatten解决方案(除非solr可以按原样消化那些嵌套的json??).
The other solutions would be to replace dots in the field names with underscores and push it to elasticsearch, but I have far better experience with solr therefore I prefer the flatten solution (unless solr can digest those nested jsons as is??).
只有在消化过程比 solr 花费的时间少得多的情况下,我才会更喜欢 elasticsearch,因为我的首要任务是尽可能快地消化(因此我选择了 jq 而不是在 python 中编写脚本).
I will prefer elasticsearch only if the digestion process will take far less time than solr, because my priority is digesting as fast as I can (thus I chose jq instead of scripting it in python).
请帮忙.
我认为这对示例 3 和 4 为我解决了这个问题:https://lucidworks.com/blog/2014/08/12/indexing-custom-json-data/
I think the pair of examples 3&4 solves this for me: https://lucidworks.com/blog/2014/08/12/indexing-custom-json-data/
我会尽快尝试.
推荐答案
您还可以使用以下 jq 命令以这种方式展平嵌套的 JSON 对象:
You can also use the following jq command to flatten nested JSON objects in this manner:
[leaf_paths as $path | {"key": $path | join("."), "value": getpath($path)}] | from_entries
它的工作方式是:leaf_paths
返回一个数组流,这些数组表示给定 JSON 文档上出现叶元素"的路径,即没有子元素的元素,例如数字、字符串和布尔值.我们将该流传输到具有 key
和 value
属性的对象中,其中 key
包含路径数组的元素作为由点和 <连接的字符串code>value 包含该路径的元素.最后,我们将整个事物放入一个数组中并在其上运行 from_entries
,这会将 {key, value}
对象的数组转换为包含这些键值对的对象.
The way it works is: leaf_paths
returns a stream of arrays which represent the paths on the given JSON document at which "leaf elements" appear, that is, elements which do not have child elements, such as numbers, strings and booleans. We pipe that stream into objects with key
and value
properties, where key
contains the elements of the path array as a string joined by dots and value
contains the element at that path. Finally, we put the entire thing in an array and run from_entries
on it, which transforms an array of {key, value}
objects into an object containing those key-value pairs.
这篇关于使用 jq 展平嵌套的 JSON的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!