使用JQ拆分/切片大型JSON [英] Split/Slice large JSON using jq
本文介绍了使用JQ拆分/切片大型JSON的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
希望根据数组大小(10000/50000等)将一个约20 GB的大型JSON文件分成较小的数据块。
输入:
{"recDt":"2021-01-05",
"country":"US",
"name":"ABC",
"number":"9828",
"add": [
{"evnCd":"O","rngNum":"1","state":"TX","city":"ANDERSON","postal":"77830"},
{"evnCd":"O","rngNum":"2","state":"TX","city":"ANDERSON","postal":"77832"},
{"evnCd":"O","rngNum":"3","state":"TX","city":"ANDERSON","postal":"77831"},
{"evnCd":"O","rngNum":"4","state":"TX","city":"ANDERSON","postal":"77834"}
]
}
当前运行在循环中,通过递增x/y值来获得所需的输出,但性能非常慢,迭代需要非常8-20秒的时间,这取决于文件的大小来完成拆分过程。目前使用的是1.6版本,有没有其他方法可以降低结果
预期输出:数组中包含2个对象的切片
{"recDt":"2021-01-05","country":"US","name":"ABC","number":"9828","add":[{"rngNum":"1","state":"TX","city":"ANDERSON","postal":"77830"},{"rngNum":"2","state":"TX","city":"ANDERSON","postal":"77832"}]}
{"recDt":"2021-01-05","country":"US","name":"ABC","number":"9828","add":[{"rngNum":"3","state":"TX","city":"ANDERSON","postal":"77831"},{"rngNum":"4","state":"TX","city":"ANDERSON","postal":"77834"}]}
已尝试
cat $inFile | jq -cn --stream 'fromstream(1|truncate_stream(inputs))' | jq --arg x $x --arg y $y -c '{recDt: .recDt, country: .country, name: .name, number: .number, add: .add[$x|tonumber:$y|tonumber]}' >> $outFile
cat $inFile | jq --arg x $x --arg y $y -c '{recDt: .recDt, country: .country, name: .name, number: .number, add: .add[$x|tonumber:$y|tonumber]}' >> $outFile
如果有可用的替代方案,请分享..
推荐答案
在这个只调用JQ一次的响应中,我将假设您的计算机有足够的内存来读取整个JSON。我还假设您想要为每个片创建单独的文件,并且希望JSON在每个文件中打印得非常漂亮。
假设块大小为2,并且输出文件将使用模板Part-N.json命名,您可以编写:
< input.json jq -r --argjson size 2 '
del(.add) as $object
| (.add|_nwise($size) | (" ", $object + {add:.} ))
' | awk '
/^ / {fn++; next}
{ print >> "part-" fn ".json"}'
这里使用的诀窍是有效的JSON不能包含制表符。
这篇关于使用JQ拆分/切片大型JSON的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文