使用jq在json中不同值的唯一组合 [英] Unique combinations of different values in json using jq

查看:140
本文介绍了使用jq在json中不同值的唯一组合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的json文件(input.json):

I have a json file(input.json) which looks like this :

{"header1":"a","header2":1a, "header3":1a, "header4":"apple"},
{"header1":"b","header2":2a, "header3":2a, "header4":"orange"}
{"header1":"c","header2":1a, "header3":2a, "header4":"banana"},
{"header1":"d","header2":2a, "header3":1a, "header4":"apple"},
{"header1":"a","header2":2a, "header3":1a, "header4":"banana"},
{"header1":"b","header2":1a, "header3":2a, "header4":"orange"},
{"header1":"b","header2":1a, "header3":1a, "header4":"orange"},
{"header1":"d","header2":1a, "header3":1a, "header4":"apple"},
{"header1":"a","header2":2a, "header3":1a, "header4":"banana"} (repeat of line 5)

我只想过滤出每个值jq的唯一组合. 结果应如下所示:

I want to filter out only the unique combinations of each of the values jq. Results should look like:

{"header1":"a","header2":1a, "header3":1a, "header4":"apple"},
{"header1":"b","header2":2a, "header3":2a, "header4":"orange"}
{"header1":"c","header2":1a, "header3":2a, "header4":"banana"},
{"header1":"d","header2":2a, "header3":1a, "header4":"apple"},
{"header1":"a","header2":2a, "header3":1a, "header4":"banana"},
{"header1":"b","header2":1a, "header3":2a, "header4":"orange"},
{"header1":"b","header2":1a, "header3":1a, "header4":"orange"},
{"header1":"d","header2":1a, "header3":1a, "header4":"apple"}

我尝试与其他标头对header1进行分组,但并没有产生唯一的结果. 我用过unique,但是没有产生正确的结果.

I tried doing group by of header1 with the other headers but it didn't generate unique results. I've used unique but that didnt generate the proper results.

我怎么能得到这个?我是jq的新手,却找不到很多教程.

How can I get this? Im new to jq and not finding many tutorials on it.

谢谢

推荐答案

由于 peak 表示您输入的不是合法的JSON,我可以自由地对其进行更正并将其转换为单个对象的列表:

Since as peak indicated your input isn't legal JSON I've taken the liberty of correcting it and converting to a list of individual objects:

{"header1":"a","header2":"1a", "header3":"1a", "header4":"apple"}
{"header1":"b","header2":"2a", "header3":"2a", "header4":"orange"}
{"header1":"c","header2":"1a", "header3":"2a", "header4":"banana"}
{"header1":"d","header2":"2a", "header3":"1a", "header4":"apple"}
{"header1":"a","header2":"2a", "header3":"1a", "header4":"banana"}
{"header1":"b","header2":"1a", "header3":"2a", "header4":"orange"}
{"header1":"b","header2":"1a", "header3":"1a", "header4":"orange"}
{"header1":"d","header2":"1a", "header3":"1a", "header4":"apple"}
{"header1":"a","header2":"2a", "header3":"1a", "header4":"banana"}

如果此数据位于data.json中并且您运行

If this data is in data.json and you run

jq -M -s -f filter.jq data.json

以及以下filter.jq

foreach .[] as $r (
  {}
; ($r | map(.)) as $p | if getpath($p) then empty else setpath($p;1) end
; $r
)

它将以原始顺序生成以下输出,没有重复.

it will generate the following output in the original order with no duplicates.

{"header1":"a","header2":"1a","header3":"1a","header4":"apple"}
{"header1":"b","header2":"2a","header3":"2a","header4":"orange"}
{"header1":"c","header2":"1a","header3":"2a","header4":"banana"}
{"header1":"d","header2":"2a","header3":"1a","header4":"apple"}
{"header1":"a","header2":"2a","header3":"1a","header4":"banana"}
{"header1":"b","header2":"1a","header3":"2a","header4":"orange"}
{"header1":"b","header2":"1a","header3":"1a","header4":"orange"}
{"header1":"d","header2":"1a","header3":"1a","header4":"apple"}

请注意

($r | map(.))

用于生成仅包含每行中的值的数组 假定它总是产生唯一的密钥路径.这是对的 样本数据,但可能不适用于更复杂的值.

is used to generate an array containing just the values from each row which is assumed to always produce a unique key path. This is true for the sample data but may not be true for more complex values.

更慢但更强大的filter.jq

foreach .[] as $r (
  {}
; [$r | tojson] as $p | if getpath($p) then empty else setpath($p;1) end
; $r
)

使用整个行的json表示作为唯一键来确定以前是否曾看到过行.

which uses the json representation of the entire row as a unique key to determine if a row has been previously seen.

这篇关于使用jq在json中不同值的唯一组合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆