使用jq在json中不同值的唯一组合 [英] Unique combinations of different values in json using jq
问题描述
我有一个看起来像这样的json文件(input.json):
I have a json file(input.json) which looks like this :
{"header1":"a","header2":1a, "header3":1a, "header4":"apple"},
{"header1":"b","header2":2a, "header3":2a, "header4":"orange"}
{"header1":"c","header2":1a, "header3":2a, "header4":"banana"},
{"header1":"d","header2":2a, "header3":1a, "header4":"apple"},
{"header1":"a","header2":2a, "header3":1a, "header4":"banana"},
{"header1":"b","header2":1a, "header3":2a, "header4":"orange"},
{"header1":"b","header2":1a, "header3":1a, "header4":"orange"},
{"header1":"d","header2":1a, "header3":1a, "header4":"apple"},
{"header1":"a","header2":2a, "header3":1a, "header4":"banana"} (repeat of line 5)
我只想过滤出每个值jq的唯一组合. 结果应如下所示:
I want to filter out only the unique combinations of each of the values jq. Results should look like:
{"header1":"a","header2":1a, "header3":1a, "header4":"apple"},
{"header1":"b","header2":2a, "header3":2a, "header4":"orange"}
{"header1":"c","header2":1a, "header3":2a, "header4":"banana"},
{"header1":"d","header2":2a, "header3":1a, "header4":"apple"},
{"header1":"a","header2":2a, "header3":1a, "header4":"banana"},
{"header1":"b","header2":1a, "header3":2a, "header4":"orange"},
{"header1":"b","header2":1a, "header3":1a, "header4":"orange"},
{"header1":"d","header2":1a, "header3":1a, "header4":"apple"}
我尝试与其他标头对header1进行分组,但并没有产生唯一的结果.
我用过unique
,但是没有产生正确的结果.
I tried doing group by of header1 with the other headers but it didn't generate unique results.
I've used unique
but that didnt generate the proper results.
我怎么能得到这个?我是jq的新手,却找不到很多教程.
How can I get this? Im new to jq and not finding many tutorials on it.
谢谢
推荐答案
由于 peak 表示您输入的不是合法的JSON,我可以自由地对其进行更正并将其转换为单个对象的列表:
Since as peak indicated your input isn't legal JSON I've taken the liberty of correcting it and converting to a list of individual objects:
{"header1":"a","header2":"1a", "header3":"1a", "header4":"apple"}
{"header1":"b","header2":"2a", "header3":"2a", "header4":"orange"}
{"header1":"c","header2":"1a", "header3":"2a", "header4":"banana"}
{"header1":"d","header2":"2a", "header3":"1a", "header4":"apple"}
{"header1":"a","header2":"2a", "header3":"1a", "header4":"banana"}
{"header1":"b","header2":"1a", "header3":"2a", "header4":"orange"}
{"header1":"b","header2":"1a", "header3":"1a", "header4":"orange"}
{"header1":"d","header2":"1a", "header3":"1a", "header4":"apple"}
{"header1":"a","header2":"2a", "header3":"1a", "header4":"banana"}
如果此数据位于data.json
中并且您运行
If this data is in data.json
and you run
jq -M -s -f filter.jq data.json
以及以下filter.jq
foreach .[] as $r (
{}
; ($r | map(.)) as $p | if getpath($p) then empty else setpath($p;1) end
; $r
)
它将以原始顺序生成以下输出,没有重复.
it will generate the following output in the original order with no duplicates.
{"header1":"a","header2":"1a","header3":"1a","header4":"apple"}
{"header1":"b","header2":"2a","header3":"2a","header4":"orange"}
{"header1":"c","header2":"1a","header3":"2a","header4":"banana"}
{"header1":"d","header2":"2a","header3":"1a","header4":"apple"}
{"header1":"a","header2":"2a","header3":"1a","header4":"banana"}
{"header1":"b","header2":"1a","header3":"2a","header4":"orange"}
{"header1":"b","header2":"1a","header3":"1a","header4":"orange"}
{"header1":"d","header2":"1a","header3":"1a","header4":"apple"}
请注意
($r | map(.))
用于生成仅包含每行中的值的数组 假定它总是产生唯一的密钥路径.这是对的 样本数据,但可能不适用于更复杂的值.
is used to generate an array containing just the values from each row which is assumed to always produce a unique key path. This is true for the sample data but may not be true for more complex values.
更慢但更强大的filter.jq
是
foreach .[] as $r (
{}
; [$r | tojson] as $p | if getpath($p) then empty else setpath($p;1) end
; $r
)
使用整个行的json表示作为唯一键来确定以前是否曾看到过行.
which uses the json representation of the entire row as a unique key to determine if a row has been previously seen.
这篇关于使用jq在json中不同值的唯一组合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!