使用jq处理大型JSON流 [英] Process large JSON stream with jq

查看:133
本文介绍了使用jq处理大型JSON流的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从curl获得了很大的JSON流(几个GB),并尝试使用jq处理它.

I get a very large JSON stream (several GB) from curl and try to process it with jq.

我想用jq解析的相关输出包含在表示结果结构的文档中:

The relevant output I want to parse with jq is packed in a document representing the result structure:

{
  "results":[
    {
      "columns": ["n"],

      // get this
      "data": [    
        {"row": [{"key1": "row1", "key2": "row1"}], "meta": [{"key": "value"}]},
        {"row": [{"key1": "row2", "key2": "row2"}], "meta": [{"key": "value"}]}
      //  ... millions of rows      

      ]
    }
  ],
  "errors": []
}

我想用jq提取row数据.这很简单:

I want to extract the row data with jq. This is simple:

curl XYZ | jq -r -c '.results[0].data[0].row[]'

结果:

{"key1": "row1", "key2": "row1"}
{"key1": "row2", "key2": "row2"}

但是,这总是要等到curl完成.

However, this always waits until curl is completed.

我使用了--stream选项来解决这个问题.我尝试了以下命令,但也要等到从curl返回完整的对象:

I played with the --stream option which is made for dealing with this. I tried the following command but is also waits until the full object is returned from curl:

curl XYZ | jq -n --stream 'fromstream(1|truncate_stream(inputs)) | .[].data[].row[]'

是否有一种方法可以跳转"到data字段并开始一个一个地解析row而不等待关闭标签?

Is there a way to 'jump' to the data field and start parsing row one by one without waiting for closing tags?

推荐答案

(1)您将使用的香草过滤器如下:

(1) The vanilla filter you would use would be as follows:

jq -r -c '.results[0].data[].row'

(2)在这里使用流解析器的一种方法是使用它来处理.results[0].data的输出,但是这两个步骤的组合可能会比普通方法慢.

(2) One way to use the streaming parser here would be to use it to process the output of .results[0].data, but the combination of the two steps will probably be slower than the vanilla approach.

(3)要生成所需的输出,可以运行:

(3) To produce the output you want, you could run:

jq -nc --stream '
  fromstream(inputs
    | select( [.[0][0,2,4]] == ["results", "data", "row"])
    | del(.[0][0:5]) )'

(4)或者,您可能希望尝试以下方法:

(4) Alternatively, you may wish to try something along these lines:

jq -nc --stream 'inputs
      | select(length==2)
      | select( [.[0][0,2,4]] == ["results", "data", "row"])
      | [ .[0][6], .[1]] '

对于说明性输入,最后一次调用的输出为:

For the illustrative input, the output from the last invocation would be:

["key1","row1"] ["key2","row1"] ["key1","row2"] ["key2","row2"]

["key1","row1"] ["key2","row1"] ["key1","row2"] ["key2","row2"]

这篇关于使用jq处理大型JSON流的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆