在jq中获取对象数组索引 [英] Getting the object array index in jq
问题描述
我有一个看起来像这样的json对象(由i3-msg -t get_workspaces
生产.
I have a json object that looks like this (prodused by i3-msg -t get_workspaces
.
[
{
"name": "1",
"urgent": false
},
{
"name": "2",
"urgent": false
},
{
"name": "something",
"urgent": false
}
]
我正在尝试使用jq
来确定列表中的哪个索引号基于select
查询. jq
有一个叫做index()
的东西,但是它只能支持字符串吗?
I am trying to use jq
to figure out which index number in the list is based on a select
query. jq
have something called index()
, but it seams to support only strings?
使用类似i3-msg -t get_workspaces | jq '.[] | select(.name=="something")'
之类的东西给我想要的对象.但我希望它是索引.在这种情况下,2
(从0开始计数)
Using something like i3-msg -t get_workspaces | jq '.[] | select(.name=="something")'
gives me the object I want. But I want it's index. In this case 2
(starting counting at 0)
单独使用jq
可以吗?
推荐答案
因此,我为OP的解决方案提供了一种策略,OP很快就接受了该解决方案.随后,@ peak和@Jeff Mercado提供了更好,更完整的解决方案.因此,我已将其转变为社区Wiki.如果可以的话,请改善这个答案.
一个简单的解决方案(由@peak指出)是使用内置函数index
:
A straightforward solution (pointed out by @peak) is to use the builtin function, index
:
map(.name == "something") | index(true)
jq
文档令人困惑地建议index
对字符串进行操作,但对数组也进行操作.因此,index(true)
返回由映射产生的布尔数组中的第一个true
的索引.如果没有满足该条件的项目,则结果为空.
The jq
documentation confusingly suggests that index
operates on strings, but it operates on arrays as well. Thus index(true)
returns the index of the first true
in the array of booleans produced by the map. If there is no item satisfying the condition, the result is null.
jq表达式以惰性"方式进行评估,但map
将遍历整个输入数组.我们可以通过重写上面的代码并引入一些调试语句来验证这一点:
jq expresions are evaluated in a "lazy" manner, but map
will traverse the entire input array. We can verify this by rewriting the above code and introducing some debug statements:
[ .[] | debug | .name == "something" ] | index(true)
@peak建议,做得更好的关键是使用jq 1.5中引入的break
语句:
As suggested by @peak, the key to doing better is to use the break
statement introduced in jq 1.5:
label $out |
foreach .[] as $item (
-1;
.+1;
if $item.name == "something" then
.,
break $out
else
empty
end
) // null
请注意,//
没有注释;它是替代运算符.如果找不到该名称,则foreach
将返回empty
,该值将由替代运算符转换为null.
Note that the //
is no comment; it is the alternative operator. If the name is not found the foreach
will return empty
which will be converted to null by the alternative operator.
另一种方法是递归处理数组:
Another approach is to recursively process the array:
def get_index(name):
name as $name |
if (. == []) then
null
elif (.[0].name == $name) then
0
else
(.[1:] | get_index($name)) as $result |
if ($result == null) then null else $result+1 end
end;
get_index("something")
但是,在@Jeff Mercado指出的最坏情况下,此递归实现将使用与数组长度成比例的堆栈空间.在版本1.5中,jq
引入了尾部呼叫优化(TCO)这将使我们能够使用本地帮助器函数优化此过程(请注意,这与@Jeff Mercado提供的解决方案略有不同,以与上述示例保持一致):
However this recursive implementation will use stack space proportional to the length of the array in the worst case as pointed out by @Jeff Mercado. In version 1.5 jq
introduced Tail Call Optimization (TCO) which will allow us to optimize this away using a local helper function (note that this is minor adaptation to a solution provided by @Jeff Mercado so as to be consistent with the above examples):
def get_index(name):
name as $name |
def _get_index:
if (.i >= .len) then
null
elif (.array[.i].name == $name) then
.i
else
.i += 1 | _get_index
end;
{ array: ., i: 0, len: length } | _get_index;
get_index("something")
根据@peak在jq
中获得数组的长度是恒定时间操作,显然对数组进行索引也很便宜.我将尝试为此找到一个引用.
According to @peak obtaining the length of an array in jq
is a constant time operation, and apparently indexing an array is inexpensive as well. I will try to find a citation for this.
现在让我们尝试进行实际测量.这是测量简单解决方案的示例:
Now let's try to actually measure. Here is an example of measuring the simple solution:
#!/bin/bash
jq -n '
def get_index(name):
name as $name |
map(.name == $name) | index(true)
;
def gen_input(n):
n as $n |
if ($n == 0) then
[]
else
gen_input($n-1) + [ { "name": $n, "urgent":false } ]
end
;
2000 as $n |
gen_input($n) as $i |
[(0 | while (.<$n; [ ($i | get_index(.)), .+1 ][1]))][$n-1]
'
当我在计算机上运行它时,得到以下信息:
When I run this on my machine, I get the following:
$ time ./simple
1999
real 0m10.024s
user 0m10.023s
sys 0m0.008s
如果我将其替换为get_index的快速"版本:
If I replace this with the "fast" version of get_index:
def get_index(name):
name as $name |
label $out |
foreach .[] as $item (
-1;
.+1;
if $item.name == $name then
.,
break $out
else
empty
end
) // null;
然后我得到:
$ time ./fast
1999
real 0m13.165s
user 0m13.173s
sys 0m0.000s
如果我将其替换为快速"递归版本:
And if I replace it with the "fast" recursive version:
def get_index(name):
name as $name |
def _get_index:
if (.i >= .len) then
null
elif (.array[.i].name == $name) then
.i
else
.i += 1 | _get_index
end;
{ array: ., i: 0, len: length } | _get_index;
我得到:
$ time ./fast-recursive
1999
real 0m52.628s
user 0m52.657s
sys 0m0.005s
太好了!但是我们可以做得更好. @peak提到了一个未记录的开关--debug-dump-disasm
,它使您可以查看jq
如何编译代码.这样,您可以看到修改对象并将其传递给_indexof
,然后提取数组,长度和索引非常昂贵.重构以仅通过索引是一个巨大的改进,并且进一步的优化以避免对索引的长度进行测试,使其与迭代版本具有竞争力:
Ouch! But we can do better. @peak mentioned an undocumented switch --debug-dump-disasm
which lets you see how jq
is compiling your code. With this you can see that modifying and passing the object to _indexof
and then extracting the array, length, and index is expensive. Refactoring to just pass the index is a huge improvement, and a further refinement to avoid testing the index against the length makes it competitive with the iterative version:
def indexof($name):
(.+[{name: $name}]) as $a | # add a "sentinel"
length as $l | # note length sees original array
def _indexof:
if ($a[.].name == $name) then
if (. != $l) then . else null end
else
.+1 | _indexof
end
;
0 | _indexof
;
我得到:
$ time ./fast-recursive2
null
real 0m13.238s
user 0m13.243s
sys 0m0.005s
因此,如果每个元素的可能性均等,并且希望获得平均性能,则应该坚持简单的实现. (使用C编码的函数往往很快!)
So it appears that if each element is equally likely, and you want an average case performance, you should stick with the simple implementation. (C-coded functions tend to be fast!)
这篇关于在jq中获取对象数组索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!