用jq处理巨大的GEOJson文件 [英] Process huge GEOJson file with jq
问题描述
给出如下的GEOJson文件:-
Given a GEOJson file as follows:-
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {
"FEATCODE": 15014
},
"geometry": {
"type": "Polygon",
"coordinates": [
.....
我想结束以下内容:-
{
"type": "FeatureCollection",
"features": [
{
"tippecanoe : {"minzoom" : 13},
"type": "Feature",
"properties": {
"FEATCODE": 15014
},
"geometry": {
"type": "Polygon",
"coordinates": [
.....
即.我已将tippecanoe对象添加到数组功能
ie. I have added the tippecanoe object to each feature in the array features
我可以使用:-
jq '.features[].tippecanoe.minzoom = 13' <GEOJSON FILE> > <OUTPUT FILE>
哪个适合小文件.但是处理414Mb的大文件似乎要花很长时间,因为处理器会达到极限,并且什么也不会写入到输出文件中
Which is fine for small files. But processing a large file of 414Mb seems to take forever with the processor maxing out and nothing being written to the OUTPUT FILE
深入了解jq似乎可以使用-stream 命令行参数,但是我对于如何将其用于我的目的完全感到困惑.
Reading further into jq it appears that the --stream command line parameter may help but I am completely confused as to how to use this for my purposes.
我将感谢一个示例命令行,它可以满足我的目的,并提供有关--stream在做什么的解释.
I would be grateful for an example command line that serves my purposes along with an explanation as to what --stream is doing.
推荐答案
仅通过jq的方法可能需要比可用内存更多的RAM.如果是这样,那么下面将显示一个简单的all-jq方法,以及一种基于jq和awk的更经济的方法.
A one-pass jq-only approach may require more RAM than is available. If that is the case, then a simple all-jq approach is shown below, together with a more economical approach based on using jq along with awk.
除了将对象流重构为单个JSON文档之外,这两种方法是相同的.使用awk可以非常经济地完成此步骤.
The two approaches are the same except for the reconstitution of the stream of objects into a single JSON document. This step can be accomplished very economically using awk.
在两种情况下,都将具有所需格式对象的大型JSON输入文件命名为input.json.
In both cases, the large JSON input file with objects of the required form is assumed to be named input.json.
jq -c '.features[]' input.json |
jq -c '.tippecanoe.minzoom = 13' |
jq -c -s '{type: "FeatureCollection", features: .}'
jq和awk
jq -c '.features[]' input.json |
jq -c '.tippecanoe.minzoom = 13' | awk '
BEGIN {print "{\"type\": \"FeatureCollection\", \"features\": ["; }
NR==1 { print; next }
{print ","; print}
END {print "] }";}'
性能比较
为了进行比较,使用了.features []中具有10,000,000个对象的输入文件.它的大小约为1GB.
Performance comparison
For comparison, an input file with 10,000,000 objects in .features[] was used. Its size is about 1GB.
u + s:
jq-only: 15m 15s
jq-awk: 7m 40s
jq one-pass using map: 6m 53s
这篇关于用jq处理巨大的GEOJson文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!