使用jq处理大型JSON流 [英] Process large JSON stream with jq
问题描述
我从curl
获得了很大的JSON流(几个GB),并尝试使用jq
处理它.
I get a very large JSON stream (several GB) from curl
and try to process it with jq
.
我想用jq
解析的相关输出包含在表示结果结构的文档中:
The relevant output I want to parse with jq
is packed in a document representing the result structure:
{
"results":[
{
"columns": ["n"],
// get this
"data": [
{"row": [{"key1": "row1", "key2": "row1"}], "meta": [{"key": "value"}]},
{"row": [{"key1": "row2", "key2": "row2"}], "meta": [{"key": "value"}]}
// ... millions of rows
]
}
],
"errors": []
}
我想用jq
提取row
数据.这很简单:
I want to extract the row
data with jq
. This is simple:
curl XYZ | jq -r -c '.results[0].data[0].row[]'
结果:
{"key1": "row1", "key2": "row1"}
{"key1": "row2", "key2": "row2"}
但是,这总是要等到curl
完成.
However, this always waits until curl
is completed.
我使用了--stream
选项来解决这个问题.我尝试了以下命令,但也要等到从curl
返回完整的对象:
I played with the --stream
option which is made for dealing with this. I tried the following command but is also waits until the full object is returned from curl
:
curl XYZ | jq -n --stream 'fromstream(1|truncate_stream(inputs)) | .[].data[].row[]'
是否有一种方法可以跳转"到data
字段并开始一个一个地解析row
而不等待关闭标签?
Is there a way to 'jump' to the data
field and start parsing row
one by one without waiting for closing tags?
推荐答案
(1)您将使用的香草过滤器如下:
(1) The vanilla filter you would use would be as follows:
jq -r -c '.results[0].data[].row'
(2)在这里使用流解析器的一种方法是使用它来处理.results[0].data
的输出,但是这两个步骤的组合可能会比普通方法慢.
(2) One way to use the streaming parser here would be to use it to process the output of .results[0].data
, but the combination of the two steps will probably be slower than the vanilla approach.
(3)要生成所需的输出,可以运行:
(3) To produce the output you want, you could run:
jq -nc --stream '
fromstream(inputs
| select( [.[0][0,2,4]] == ["results", "data", "row"])
| del(.[0][0:5]) )'
(4)或者,您可能希望尝试以下方法:
(4) Alternatively, you may wish to try something along these lines:
jq -nc --stream 'inputs
| select(length==2)
| select( [.[0][0,2,4]] == ["results", "data", "row"])
| [ .[0][6], .[1]] '
对于说明性输入,最后一次调用的输出为:
For the illustrative input, the output from the last invocation would be:
["key1","row1"]
["key2","row1"]
["key1","row2"]
["key2","row2"]
["key1","row1"]
["key2","row1"]
["key1","row2"]
["key2","row2"]
这篇关于使用jq处理大型JSON流的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!