Bash:在数组中查找非重复元素 [英] Bash: find non-repeated elements in an array

查看:133
本文介绍了Bash:在数组中查找非重复元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种在bash数组中查找非重复元素的方法.

I'm looking for a way to find non-repeated elements in an array in bash.

简单的例子:

joined_arrays=(CVE-2015-4840 CVE-2015-4840 CVE-2015-4860 CVE-2015-4860 CVE-2016-3598)
<magic>
non_repeated=(CVE-2016-3598)

为了提供上下文,这里的目标是最终得到所有软件包更新CVE的数组,这些CVE由于被排除而通常无法通过主机上的"yum update"获得.我想出这种方法的方法是填充3个初始数组:

To give context, the goal here is to end up with an array of all package update CVEs that aren't generally available via 'yum update' on a host due to being excluded. The way I came up with doing such a thing is to populate 3 preliminary arrays:

  • available_updates =()#恰好提供"yum更新"
  • all_updates =()#包括排除项
  • joined_updates =()#两个先前数组的内容 然后将逻辑应用于joined_updates =(),该逻辑将仅返回恰好包含一次的元素.任何出现两次的元素都是可以正常更新的元素,不需要以'excluded_updates =()'数组结尾.
  • available_updates=() #just what 'yum update' would provide
  • all_updates=() #including excluded ones
  • joined_updates=() # contents of both prior arrays Then apply logic to joined_updates=() that would return only elements that are included exactly once. Any element with two occurrences is one that can be updated normally and doesn't need to end up in the 'excluded_updates=()' array.

希望这是有道理的.在输入时,我想知道从all_updates =()中删除available_updates =()中找到的所有元素是否更简单,将其余元素保留为排除的更新.

Hopefully this makes sense. As I was typing it out I'm wondering if it might be simpler to just remove all elements found in available_updates=() from all_updates=(), leaving the remaining ones as the excluded updates.

谢谢!

推荐答案

一种纯bash方法是将一个计数器存储在关联数组中,然后查找该计数器恰好是一个的项目:

One pure-bash approach is to store a counter in an associative array, and then look for items where the counter is exactly one:

declare -A seen=( )                   # create an associative array (requires bash 4)
for item in "${joined_arrays[@]}"; do # iterate over original items
  (( seen[$item] += 1 ))              # increment value associated with item
done

declare -a non_repeated=( )
for item in "${!seen[@]}"; do         # iterate over keys
  if (( ${seen[$item]} == 1 )); then  # if counter for that key is 1...
    non_repeated+=( "$item" )         # ...add that item to the output array.
done

declare -p non_repeated               # print result


另一种更简洁的方法(但Buggier -不适用于包含换行文字的值)是利用标准文本操作工具的优势:


Another, terser (but buggier -- doesn't work with values containing newline literals) approach is to take advantage of standard text manipulation tools:

non_repeated=( )        # setup

# use uniq -c to count; filter for results with a count of 1
while read -r count value; do
  (( count == 1 )) && non_repeated+=( "$value" )
done < <(printf '%s\n' "${joined_arrays[@]}" | sort | uniq -c)

declare -p non_repeated # print result

...,或者甚至更短(还有buggier,要求数组值在awk中精确地划分为一个字段):

...or, even terser (and buggier, requiring that the array value split into exactly one field in awk):

readarray -t non_repeated \
  < <(printf '%s\n' "${joined_arrays[@]}" | sort | uniq -c | awk '$1 == 1 { print $2; }'


要给我一个婴儿床一个答案,我真的应该从 @Aaron 中走出来(谁值得使用此工具的人投票) ;请注意,它保留了不与值一起使用换行符"错误),也可以使用uniq -u:


To crib an answer I really should have come up myself from @Aaron (who deserves an upvote from anyone using this; do note that it retains the doesn't-work-with-values-with-newlines bug), one can also use uniq -u:

readarray -t non_repeated < <(printf '%s\n' "${joined_arrays[@]}" | sort | uniq -u)

这篇关于Bash:在数组中查找非重复元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆