在jq中获取对象数组索引 [英] Getting the object array index in jq

查看:69
本文介绍了在jq中获取对象数组索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的json对象(由i3-msg -t get_workspaces生产.

I have a json object that looks like this (prodused by i3-msg -t get_workspaces.

[
  {
    "name": "1",
    "urgent": false
  },
  {
    "name": "2",
    "urgent": false
  },
  {
    "name": "something",
    "urgent": false
  }
]

我正在尝试使用jq来确定列表中的哪个索引号基于select查询. jq有一个叫做index()的东西,但是它只能支持字符串吗?

I am trying to use jq to figure out which index number in the list is based on a select query. jq have something called index(), but it seams to support only strings?

使用类似i3-msg -t get_workspaces | jq '.[] | select(.name=="something")'之类的东西给我想要的对象.但我希望它是索引.在这种情况下,2(从0开始计数)

Using something like i3-msg -t get_workspaces | jq '.[] | select(.name=="something")' gives me the object I want. But I want it's index. In this case 2 (starting counting at 0)

单独使用jq可以吗?

推荐答案

因此,我为OP的解决方案提供了一种策略,OP很快就接受了该解决方案.随后,@ peak和@Jeff Mercado提供了更好,更完整的解决方案.因此,我已将其转变为社区Wiki.如果可以的话,请改善这个答案.

一个简单的解决方案(由@peak指出)是使用内置函数index:

A straightforward solution (pointed out by @peak) is to use the builtin function, index:

map(.name == "something") | index(true)

jq文档令人困惑地建议index对字符串进行操作,但对数组也进行操作.因此,index(true)返回由映射产生的布尔数组中的第一个true的索引.如果没有满足该条件的项目,则结果为空.

The jq documentation confusingly suggests that index operates on strings, but it operates on arrays as well. Thus index(true) returns the index of the first true in the array of booleans produced by the map. If there is no item satisfying the condition, the result is null.

jq表达式以惰性"方式进行评估,但map将遍历整个输入数组.我们可以通过重写上面的代码并引入一些调试语句来验证这一点:

jq expresions are evaluated in a "lazy" manner, but map will traverse the entire input array. We can verify this by rewriting the above code and introducing some debug statements:

[ .[] | debug | .name == "something" ] | index(true)

@peak建议,做得更好的关键是使用jq 1.5中引入的break语句:

As suggested by @peak, the key to doing better is to use the break statement introduced in jq 1.5:

label $out | 
foreach .[] as $item (
  -1; 
  .+1; 
  if $item.name == "something" then 
    ., 
    break $out 
  else 
    empty
  end
) // null

请注意,//没有注释;它是替代运算符.如果找不到该名称,则foreach将返回empty,该值将由替代运算符转换为null.

Note that the // is no comment; it is the alternative operator. If the name is not found the foreach will return empty which will be converted to null by the alternative operator.

另一种方法是递归处理数组:

Another approach is to recursively process the array:

def get_index(name): 
  name as $name | 
  if (. == []) then
    null
  elif (.[0].name == $name) then 
    0 
  else 
    (.[1:] | get_index($name)) as $result |
    if ($result == null) then null else $result+1 end      
end;
get_index("something")

但是,在@Jeff Mercado指出的最坏情况下,此递归实现将使用与数组长度成比例的堆栈空间.在版本1.5中,jq引入了尾部呼叫优化(TCO)这将使我们能够使用本地帮助器函数优化此过程(请注意,这与@Jeff Mercado提供的解决方案略有不同,以与上述示例保持一致):

However this recursive implementation will use stack space proportional to the length of the array in the worst case as pointed out by @Jeff Mercado. In version 1.5 jq introduced Tail Call Optimization (TCO) which will allow us to optimize this away using a local helper function (note that this is minor adaptation to a solution provided by @Jeff Mercado so as to be consistent with the above examples):

def get_index(name): 
  name as $name | 
  def _get_index:
    if (.i >= .len) then
      null
    elif (.array[.i].name == $name) then
      .i
    else
      .i += 1 | _get_index
    end;
  { array: ., i: 0, len: length } | _get_index;
get_index("something")

根据@peak在jq中获得数组的长度是恒定时间操作,显然对数组进行索引也很便宜.我将尝试为此找到一个引用.

According to @peak obtaining the length of an array in jq is a constant time operation, and apparently indexing an array is inexpensive as well. I will try to find a citation for this.

现在让我们尝试进行实际测量.这是测量简单解决方案的示例:

Now let's try to actually measure. Here is an example of measuring the simple solution:

#!/bin/bash

jq -n ' 

  def get_index(name): 
    name as $name |
    map(.name == $name) | index(true)
  ;

  def gen_input(n):  
    n as $n |
    if ($n == 0) then 
      []
    else
      gen_input($n-1) + [ { "name": $n, "urgent":false } ]
    end
  ;  

  2000 as $n |
  gen_input($n) as $i |
  [(0 | while (.<$n; [ ($i | get_index(.)), .+1 ][1]))][$n-1]
'

当我在计算机上运行它时,得到以下信息:

When I run this on my machine, I get the following:

$ time ./simple
1999

real    0m10.024s
user    0m10.023s
sys     0m0.008s

如果我将其替换为get_index的快速"版本:

If I replace this with the "fast" version of get_index:

def get_index(name): 
  name as $name |
  label $out | 
  foreach .[] as $item (
    -1; 
    .+1; 
  if $item.name == $name then 
    ., 
    break $out 
  else 
    empty
  end
) // null;

然后我得到:

$ time ./fast
1999

real    0m13.165s
user    0m13.173s
sys     0m0.000s

如果我将其替换为快速"递归版本:

And if I replace it with the "fast" recursive version:

def get_index(name): 
  name as $name | 
  def _get_index:
    if (.i >= .len) then
      null
    elif (.array[.i].name == $name) then
      .i
    else
      .i += 1 | _get_index
    end;
  { array: ., i: 0, len: length } | _get_index;

我得到:

$ time ./fast-recursive 
1999

real    0m52.628s
user    0m52.657s
sys     0m0.005s

太好了!但是我们可以做得更好. @peak提到了一个未记录的开关--debug-dump-disasm,它使您可以查看jq如何编译代码.这样,您可以看到修改对象并将其传递给_indexof,然后提取数组,长度和索引非常昂贵.重构以仅通过索引是一个巨大的改进,并且进一步的优化以避免对索引的长度进行测试,使其与迭代版本具有竞争力:

Ouch! But we can do better. @peak mentioned an undocumented switch --debug-dump-disasm which lets you see how jq is compiling your code. With this you can see that modifying and passing the object to _indexof and then extracting the array, length, and index is expensive. Refactoring to just pass the index is a huge improvement, and a further refinement to avoid testing the index against the length makes it competitive with the iterative version:

def indexof($name):
  (.+[{name: $name}]) as $a | # add a "sentinel"
  length as $l | # note length sees original array
  def _indexof:
    if ($a[.].name == $name) then
      if (. != $l) then . else null end
    else
      .+1 | _indexof
    end
  ;


  0 | _indexof
;

我得到:

$ time ./fast-recursive2
null

real    0m13.238s
user    0m13.243s
sys     0m0.005s

因此,如果每个元素的可能性均等,并且希望获得平均性能,则应该坚持简单的实现. (使用C编码的函数往往很快!)

So it appears that if each element is equally likely, and you want an average case performance, you should stick with the simple implementation. (C-coded functions tend to be fast!)

这篇关于在jq中获取对象数组索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆