根据特定约束使用jq转换json [英] Convert json using jq based on specific constraints
问题描述
我有一个json文件'OpenEnded_mscoco_val2014.json'.json文件包含121512个问题.
这是一些示例:
I have a json file 'OpenEnded_mscoco_val2014.json'.The json file contains 121,512 questions.
Here is some sample :
"questions": [
{
"question": "What is the table made of?",
"image_id": 350623,
"question_id": 3506232
},
{
"question": "Is the food napping on the table?",
"image_id": 350623,
"question_id": 3506230
},
{
"question": "What has been upcycled to make lights?",
"image_id": 350623,
"question_id": 3506231
},
{
"question": "Is this an Spanish town?",
"image_id": 8647,
"question_id": 86472
}
]
我用jq -r '.questions | [map(.question), map(.image_id), map(.question_id)] | @csv' OpenEnded_mscoco_val2014_questions.json >> temp.csv
将json转换为csv.
但是在csv中输出的是问题,后面是image_id,这是上面的代码所做的.
预期的输出是:
I used jq -r '.questions | [map(.question), map(.image_id), map(.question_id)] | @csv' OpenEnded_mscoco_val2014_questions.json >> temp.csv
to convert json into csv.
But here output in csv is question followed by image_id which is what above code does.
The expected output is :
"What is table made of",350623,3506232
"Is the food napping on the table?",350623,3506230
是否还可以仅过滤具有image_id <= 10000
的结果并过滤到group questions having same image_id
?例如json的1,2,3结果可以合并为3个问题,其中1个image_id,3个Question_id.
Also is it possible to filter only results havingimage_id <= 10000
and to group questions having same image_id
? e.g. 1,2,3 result of json can be combined to have 3 questions, 1 image_id, 3 question_id.
第一个问题由possible duplicate question
解决.我想知道是否有可能在jq的命令行上调用比较运算符以转换json文件.在这种情况下,仅从image_id <= 10000
获取json中的所有字段.
EDIT : The first problem is solved by possible duplicate question
.I would like to know if is it possible to invoke comparison operator on command line in jq for converting json file. In this case get all fields from json if image_id <= 10000
only.
推荐答案
1)给定您的输入(精心构造以使其成为有效的JSON),以下查询将生成CSV输出,如下所示:
1) Given your input (suitably elaborated to make it valid JSON), the following query generates the CSV output as shown:
$ jq -r '.questions[] | [.question, .image_id, .question_id] | @csv'
"What is the table made of?",350623,3506232
"Is the food napping on the table?",350623,3506230
"What has been upcycled to make lights?",350623,3506231
"Is this an Spanish town?",8647,86472
这里要记住的关键是@csv需要一个平面数组,但是与所有jq过滤器一样,您可以向其提供流.
The key thing to remember here is that @csv requires a flat array, but as with all jq filters, you can feed it a stream.
2)要使用标准.image_id <= 10000
进行过滤,只需插入适当的select/1
过滤器:
2) To filter using the criterion .image_id <= 10000
, just interpose the appropriate select/1
filter:
.questions[]
| select(.image_id <= 10000)
| [.question, .image_id, .question_id]
| @csv
3)要按image_id排序,请使用sort_by(.image_id)
3) To sort by image_id, use sort_by(.image_id)
.questions
| sort_by(.image_id)
|.[]
| [.question, .image_id, .question_id]
| @csv
4)要按.image_id
分组,请将以下管道的输出通过管道传递到您自己的管道中:
4) To group by .image_id
you would pipe the output of the following pipeline into your own pipeline:
.questions | group_by(.image_id)
但是,您将必须准确决定要如何组合分组的对象.
You will, however, have to decide exactly how you want to combine the grouped objects.
这篇关于根据特定约束使用jq转换json的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!