jq:groupby和嵌套的json数组 [英] jq: groupby and nested json arrays

查看:152
本文介绍了jq:groupby和嵌套的json数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有:[[1,2], [3,9], [4,2], [], []]

我想知道要获取的脚本:

I would like to know the scripts to get:

  • 不是非空的嵌套列表的数量.即想得到:[3,2]

包含或不包含数字3的嵌套列表的数量.即要获取:[1,4]

The number of nested lists which contain or not contain number 3. ie want to get: [1,4]

其元素的总和不小于4的嵌套列表的数量.即想得到:[3,2]

The number of nested lists for which the sum of the elements is/isn't less than 4. ie want to get: [3,2]

即嵌套数据分区的基本示例.

ie basic examples of nested data partition.

推荐答案

由于stackoverflow.com不是编码服务,因此我将对第一个问题的回答予以限制,希望它可以说服您学习jq是值得付出努力.

Since stackoverflow.com is not a coding service, I'll confine this response to the first question, with the hope that it will convince you that learning jq is worth the effort.

让我们首先完善有关列表计数的问题 不为空"将强调答案中的第一个数字应与空列表的数量(2)相对应,而第二个数字应与其余的列表(3)相对应.也就是说,所需答案应为[2,3].

Let's begin by refining the question about the counts of the lists "which are/are not empty" to emphasize that the first number in the answer should correspond to the number of empty lists (2), and the second number to the rest (3). That is, the required answer should be [2,3].

下一步可能是询问是否可以使用group_by.如果顺序无关紧要,我们可以简单地写:

The next step might be to ask whether group_by can be used. If the ordering did not matter, we could simply write:

group_by(length==0) | map(length)

这将返回[3,2],这不是我们想要的.现在值得检查有关group_by应该做什么的文档.关于在 https://stedolan.github.io/jq/manual/#Builtinoperatorsandfunctions , 我们看到,按设计group_by确实按分组值排序.

This returns [3,2], which is not quite what we want. It's now worth checking the documentation about what group_by is supposed to do. On checking the details at https://stedolan.github.io/jq/manual/#Builtinoperatorsandfunctions, we see that by design group_by does indeed sort by the grouping value.

由于在jq中,false < true,我们可以通过编写以下内容来解决我们的首次尝试:

Since in jq, false < true, we could fix our first attempt by writing:

group_by(length > 0) | map(length)

很好,但是由于group_by在我们真正需要的只是一种计数方法时正在做大量工作,因此很显然,我们应该能够提出一个更有效(并且希望更少不透明)的解决方案.

That's nice, but since group_by is doing so much work when all we really need is a way to count, it's clear we should be able to come up with a more efficient (and hopefully less opaque) solution.

从根本上讲,问题可以归结为计数,因此让我们定义一个通用的tabulate过滤器,以产生不同的字符串值的计数.这是一个足以满足当前目的的定义:

At its core the problem boils down to counting, so let's define a generic tabulate filter for producing the counts of distinct string values. Here's a def that will suffice for present purposes:

# Produce a JSON object recording the counts of distinct
# values in the given stream, which is assumed to consist 
# solely of strings.
def tabulate(stream):
  reduce stream as $s ({}; .[$s] += 1);

现在只需两行就可以写下一个有效的解决方案:

An efficient solution can now be written down in just two lines:

tabulate(.[] | length==0 | tostring )
| [.["true", "false"]]

QED

上面命名为tabulate的函数有时称为bow(对于单词袋").在某些方面,这将是一个更好的名称,尤其是将名称tabulate保留用于适用于任意流的类似功能.

The function named tabulate above is sometimes called bow (for "bag of words"). In some ways, that would be a better name, especially as it would make sense to reserve the name tabulate for similar functionality that would work for arbitrary streams.

这篇关于jq:groupby和嵌套的json数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆