用jq处理巨大的GEOJson文件 [英] Process huge GEOJson file with jq

查看:121
本文介绍了用jq处理巨大的GEOJson文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出如下的GEOJson文件:-

Given a GEOJson file as follows:-

{
  "type": "FeatureCollection",
  "features": [
   {
     "type": "Feature",
     "properties": {
     "FEATCODE": 15014
  },
  "geometry": {
    "type": "Polygon",
    "coordinates": [
     .....

我想结束以下内容:-

{
  "type": "FeatureCollection",
  "features": [
   {
     "tippecanoe : {"minzoom" : 13},
     "type": "Feature",
     "properties": {
     "FEATCODE": 15014
  },
  "geometry": {
    "type": "Polygon",
    "coordinates": [
     .....

即.我已将tippecanoe对象添加到数组功能

ie. I have added the tippecanoe object to each feature in the array features

我可以使用:-

 jq '.features[].tippecanoe.minzoom = 13' <GEOJSON FILE> > <OUTPUT FILE>

哪个适合小文件.但是处理414Mb的大文件似乎要花很长时间,因为处理器会达到极限,并且什么也不会写入到输出文件中

Which is fine for small files. But processing a large file of 414Mb seems to take forever with the processor maxing out and nothing being written to the OUTPUT FILE

深入了解jq似乎可以使用-stream 命令行参数,但是我对于如何将其用于我的目的完全感到困惑.

Reading further into jq it appears that the --stream command line parameter may help but I am completely confused as to how to use this for my purposes.

我将感谢一个示例命令行,它可以满足我的目的,并提供有关--stream在做什么的解释.

I would be grateful for an example command line that serves my purposes along with an explanation as to what --stream is doing.

推荐答案

仅通过jq的方法可能需要比可用内存更多的RAM.如果是这样,那么下面将显示一个简单的all-jq方法,以及一种基于jq和awk的更经济的方法.

A one-pass jq-only approach may require more RAM than is available. If that is the case, then a simple all-jq approach is shown below, together with a more economical approach based on using jq along with awk.

除了将对象流重构为单个JSON文档之外,这两种方法是相同的.使用awk可以非常经济地完成此步骤.

The two approaches are the same except for the reconstitution of the stream of objects into a single JSON document. This step can be accomplished very economically using awk.

在两种情况下,都将具有所需格式对象的大型JSON输入文件命名为input.json.

In both cases, the large JSON input file with objects of the required form is assumed to be named input.json.

jq -c  '.features[]' input.json |
    jq -c '.tippecanoe.minzoom = 13' |
    jq -c -s '{type: "FeatureCollection", features: .}'

jq和awk

jq -c '.features[]' input.json |
   jq -c '.tippecanoe.minzoom = 13' | awk '
     BEGIN {print "{\"type\": \"FeatureCollection\", \"features\": ["; }
     NR==1 { print; next }
           {print ","; print}
     END   {print "] }";}'

性能比较

为了进行比较,使用了.features []中具有10,000,000个对象的输入文件.它的大小约为1GB.

Performance comparison

For comparison, an input file with 10,000,000 objects in .features[] was used. Its size is about 1GB.

u + s:

jq-only:              15m 15s
jq-awk:                7m 40s
jq one-pass using map: 6m 53s

这篇关于用jq处理巨大的GEOJson文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆