使用JQ解析JSON行以按顺序翻转键值 [英] Parsing JSON lines with JQ for flapping key values in sequence

查看:102
本文介绍了使用JQ解析JSON行以按顺序翻转键值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含json行的文件,需要根据每个json拍动的"alert.status"值的顺序来验证其有效性.

有效json行的示例:

{"id":123,"code":"foo","severity":"Critical","severityCode":1, "property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"On"}}
{"id":456,"code":"bar","severity":"High","severityCode":2,"property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"On"}}
{"id":123,"code":"foo","severity":"Critical","severityCode":1,"property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"Off"}}
{"id":456,"code":"bar","severity":"High","severityCode":2,"property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"Off"}}
{"id":123,"code":"foo","severity":"Critical","severityCode":1,"property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"On"}}
{"id":456,"code":"bar","severity":"High","severityCode":2,"property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"On"}}

上述文件有效,因为重复的json(第1,5行和第2,6行)的状态从"on","off","on"等摆动.

无效的json行示例:

{"id":123,"code":"foo","severity":"Critical","severityCode":1, "property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"On"}}
{"id":456,"code":"bar","severity":"High","severityCode":2,"property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"On"}}
{"id":123,"code":"foo","severity":"Critical","severityCode":1,"property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"On"}}
{"id":456,"code":"bar","severity":"High","severityCode":2,"property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"Off"}}
{"id":123,"code":"foo","severity":"Critical","severityCode":1,"property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"Off"}}
{"id":456,"code":"bar","severity":"High","severityCode":2,"property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"Off"}}

以上是无效的,因为第1行和第3行中的json是重复的,其"status"值保持不变而不会从开或关发生震荡.

我试图使用jq将json行读入json数组

jq --slurp 'map(select(. >= 2))' jsonfile > jsonarray

但是由于每行中的顺序很重要,所以我认为我不能使用group_by查找重复项(对group_by的结果进行排序).

我正在考虑在每个json中插入一个具有递增编号的新密钥,因此在使用group_by之后,我们可以基于此新密钥对结果进行排序以返回序列.

jq中是否有一种方法可以使用除两个键以外的所有键进行分组? (在这种情况下为状态",并带有递增编号的新密钥).

有没有更好的方法来解决这个问题?

非常感谢您的帮助!

解决方案

我认为我不能使用group_by查找重复项(group_by的结果已排序).

是的,但是定义一个非排序的"group_by"非常容易,正如我们将看到的那样,除了特定指定的键之外,还可以轻松地用它对所有键进行排序.

GROUPS_BY

首先,这是一个简单的过滤器,它保留了每个组中项目的原始顺序:

# The filter, f, must produce a string for each item in `stream`
def GROUPS_BY(stream; f):
  reduce stream as $x ({}; .[$x|f] += [$x] ) | .[] ;

名称中的"S"强调函数是面向流的,首先是第一个参数是流,其次是函数产生组的流.名称大写以强调与现有内置函数的区别.

示例

要说明如何将其用于除特定键以外的所有键进行分组,请考虑以下示例(取自另一个SO问题):

def data:
  [{"foo":1,"bar":"a","baz":"whatever"},
   {"foo":1,"bar":"a","baz":"hello"},
   {"foo":1,"bar":"b","baz":"world"}] ;

GROUPS_BY(data[]; del(.baz) | tostring)

输出

[{"foo":1,"bar":"a","baz":"whatever"},{"foo":1,"bar":"a","baz":"hello"}]
[{"foo":1,"bar":"b","baz":"world"}]

精炼

可能会反对,要求f始终为字符串值,这会带来一些潜在的困难,因此这是一个有效但更通用的定义:

# Emit a stream of the groups defined by f, without using sort.
# f need not be string-valued.
def GROUPS_BY(stream; f): 
   reduce stream as $x ({};
     ($x|f) as $s
     | ($s|type) as $t
     | (if $t == "string" then $s else ($s|tojson) end) as $y
     | .[$t][$y] += [$x] )
   | .[][]
   ;

现在我们可以简单地写:

GROUPS_BY(data[]; del(.baz))

与JSON-Lines文件一起使用

GROUPS_BY与JSON-Lines文件一起使用的最简单方法是使用inputs,例如假设使用了更通用的def,您将编写:

GROUPS_BY(inputs; del(.alert))

使用inputs时,请不要忘记使用-n选项调用jq.

确定有效性的过滤器

根据我对问题的理解,可以使用以下过滤器确定组的有效性:

def changing(f):
  def c:
    if length <= 1 then true
    elif (.[0] | f) == (.[1] | f) then false
    else .[1:] | c
    end;
  c ;

(此处使用内部函数c进行有效的递归.当然,如果需要冗余计算f,则应使用变体定义.)

解决方案

使用更通用的GROUPS_BY定义将其全部放入, 并假设我们希望识别无效的组,则解决方案似乎有两点:

GROUPS_BY(inputs; del(.alert))
| select( changing(.alert.status) | not )

I have a file containing json lines that need to be verified for its validity based on the sequence of each json's flapping "alert.status" value.

A sample of valid json lines:

{"id":123,"code":"foo","severity":"Critical","severityCode":1, "property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"On"}}
{"id":456,"code":"bar","severity":"High","severityCode":2,"property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"On"}}
{"id":123,"code":"foo","severity":"Critical","severityCode":1,"property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"Off"}}
{"id":456,"code":"bar","severity":"High","severityCode":2,"property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"Off"}}
{"id":123,"code":"foo","severity":"Critical","severityCode":1,"property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"On"}}
{"id":456,"code":"bar","severity":"High","severityCode":2,"property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"On"}}

The above file is valid since the duplicate jsons (line 1,5 and line 2,6) have status flapping from "on", "off", "on" and so on.

A sample of invalid json lines:

{"id":123,"code":"foo","severity":"Critical","severityCode":1, "property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"On"}}
{"id":456,"code":"bar","severity":"High","severityCode":2,"property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"On"}}
{"id":123,"code":"foo","severity":"Critical","severityCode":1,"property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"On"}}
{"id":456,"code":"bar","severity":"High","severityCode":2,"property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"Off"}}
{"id":123,"code":"foo","severity":"Critical","severityCode":1,"property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"Off"}}
{"id":456,"code":"bar","severity":"High","severityCode":2,"property":{ "priority":"top", "owner":"dev"}, "alert":{"mgmt":"yes", "status":"Off"}}

The above is invalid since jsons in line 1 and 3 are duplicate having its "status" value stays the same without flapping from on or off.

I tried to use jq to read the json lines into a json array

jq --slurp 'map(select(. >= 2))' jsonfile > jsonarray

But since the sequence in each line is important, I don't think I can use group_by to look for duplicates (the group_by's result is sorted).

I'm thinking about inserting a new key with incremental number in each json so after using group_by, we can sort the result based on this new key to get back the sequence.

Is there a way in jq to use group by all except two keys? (in this case "status" and the new key with incremental number).

Is there any better approach how to solve this problem?

Thanks so much for your help!

解决方案

I don't think I can use group_by to look for duplicates (the group_by's result is sorted).

That's right, but it's very easy to define a non-sorting "group_by", which, as we'll see, can also easily be used to sort by all keys except for specifically designated ones.

GROUPS_BY

First, here is a simple filter which retains the original order of items within each group:

# The filter, f, must produce a string for each item in `stream`
def GROUPS_BY(stream; f):
  reduce stream as $x ({}; .[$x|f] += [$x] ) | .[] ;

The "S" in the name emphasizes that the function is stream-oriented, first in that the first argument is a stream, and second in that the function produces a stream of the groups; the name is upper-cased to emphasize the differences with the existing built-in function.

Example

To illustrate how this can be used to group by all but a specific key, consider this example (taken from another SO question):

def data:
  [{"foo":1,"bar":"a","baz":"whatever"},
   {"foo":1,"bar":"a","baz":"hello"},
   {"foo":1,"bar":"b","baz":"world"}] ;

GROUPS_BY(data[]; del(.baz) | tostring)

Output

[{"foo":1,"bar":"a","baz":"whatever"},{"foo":1,"bar":"a","baz":"hello"}]
[{"foo":1,"bar":"b","baz":"world"}]

Refinement

It may be objected that requiring that f always be string-valued introduces several potential difficulties, so here is an efficient but more versatile definition:

# Emit a stream of the groups defined by f, without using sort.
# f need not be string-valued.
def GROUPS_BY(stream; f): 
   reduce stream as $x ({};
     ($x|f) as $s
     | ($s|type) as $t
     | (if $t == "string" then $s else ($s|tojson) end) as $y
     | .[$t][$y] += [$x] )
   | .[][]
   ;

Now we can simply write:

GROUPS_BY(data[]; del(.baz))

Usage with the JSON-Lines file

The simplest way to use GROUPS_BY with a JSON-Lines file is with inputs, e.g. assuming the more versatile def is used, you'd write:

GROUPS_BY(inputs; del(.alert))

Don't forget to invoke jq with the -n option when using inputs.

A filter to determine validity

According to my understanding of the problem, the following filter can be used to determine validity of a group:

def changing(f):
  def c:
    if length <= 1 then true
    elif (.[0] | f) == (.[1] | f) then false
    else .[1:] | c
    end;
  c ;

(The inner function, c, is used here for efficient recursion. Of course, if computing f redundantly is a concern, then a variant definition should be used.)

Solution

Putting it altogether using the more versatile definition of GROUPS_BY, and assuming we wish to identify the invalid groups, the solution seems to be a two-liner:

GROUPS_BY(inputs; del(.alert))
| select( changing(.alert.status) | not )

这篇关于使用JQ解析JSON行以按顺序翻转键值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆