使用 jq 展平嵌套的 JSON [英] Flatten nested JSON using jq

查看:45
本文介绍了使用 jq 展平嵌套的 JSON的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想展平一个嵌套的 json 对象,例如{"a":{"b":1}}{"a.b":1} 以便在 solr 中消化它.

I'd like to flatten a nested json object, e.g. {"a":{"b":1}} to {"a.b":1} in order to digest it in solr.

我有 11 TB 的 json 文件,它们都是嵌套的并且在字段名称中包含点,这意味着不是 elasticsearch(点)也不是 solr(没有 _childDocument_ 符号嵌套)可以按原样消化它.

I have 11 TB of json files which are both nested and contains dots in field names, meaning not elasticsearch (dots) nor solr (nested without the _childDocument_ notation) can digest it as is.

其他解决方案是用下划线替换字段名称中的点并将其推送到elasticsearch,但我对solr的体验要好得多,因此我更喜欢flatten解决方案(除非solr可以按原样消化那些嵌套的json??).

The other solutions would be to replace dots in the field names with underscores and push it to elasticsearch, but I have far better experience with solr therefore I prefer the flatten solution (unless solr can digest those nested jsons as is??).

只有在消化过程比 solr 花费的时间少得多的情况下,我才会更喜欢 elasticsearch,因为我的首要任务是尽可能快地消化(因此我选择了 jq 而不是在 python 中编写脚本).

I will prefer elasticsearch only if the digestion process will take far less time than solr, because my priority is digesting as fast as I can (thus I chose jq instead of scripting it in python).

请帮忙.

我认为这对示例 3 和 4 为我解决了这个问题:https://lucidworks.com/blog/2014/08/12/indexing-custom-json-data/

I think the pair of examples 3&4 solves this for me: https://lucidworks.com/blog/2014/08/12/indexing-custom-json-data/

我会尽快尝试.

推荐答案

您还可以使用以下 jq 命令以这种方式展平嵌套的 JSON 对象:

You can also use the following jq command to flatten nested JSON objects in this manner:

[leaf_paths as $path | {"key": $path | join("."), "value": getpath($path)}] | from_entries

它的工作方式是:leaf_paths 返回一个数组流,这些数组表示给定 JSON 文档上出现叶元素"的路径,即没有子元素的元素,例如数字、字符串和布尔值.我们将该流传输到具有 keyvalue 属性的对象中,其中 key 包含路径数组的元素作为由点和 <连接的字符串code>value 包含该路径的元素.最后,我们将整个事物放入一个数组中并在其上运行 from_entries,这会将 {key, value} 对象的数组转换为包含这些键值对的对象.

The way it works is: leaf_paths returns a stream of arrays which represent the paths on the given JSON document at which "leaf elements" appear, that is, elements which do not have child elements, such as numbers, strings and booleans. We pipe that stream into objects with key and value properties, where key contains the elements of the path array as a string joined by dots and value contains the element at that path. Finally, we put the entire thing in an array and run from_entries on it, which transforms an array of {key, value} objects into an object containing those key-value pairs.

这篇关于使用 jq 展平嵌套的 JSON的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆