bash 中的嵌套关​​联数组 [英] nested associative arrays in bash

查看:19
本文介绍了bash 中的嵌套关​​联数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

能否构造一个关联数组,其元素包含 bash 中的数组?例如,假设有以下数组:

Can one construct an associative array whose elements contain arrays in bash? For instance, suppose one has the following arrays:

a=(a aa)
b=(b bb bbb)
c=(c cc ccc cccc)

可以创建一个关联数组来访问这些变量吗?例如,

Can one create an associate array to access these variables? For instance,

declare -A letters
letters[a]=$a
letters[b]=$b
letters[c]=$c

然后通过诸如

letter=${letters[a]}
echo ${letter[1]}

这种用于创建和访问关联数组元素的模拟语法不起作用.是否存在实现相同目标的有效表达式?

This mock syntax for creating and accessing elements of the associate array does not work. Do valid expressions accomplishing the same goals exist?

推荐答案

我认为更直接的答案是不,bash 数组不能嵌套."任何模拟嵌套数组的东西实际上只是为(单层)数组的键空间创建花哨的映射函数.

I think the more straightforward answer is "No, bash arrays cannot be nested." Anything that simulates nested arrays is actually just creating fancy mapping functions for the keyspace of the (single layered) arrays.

并不是说那不好:它可能正是您想要的,但尤其是当您不控制数组中的键时,正确地执行它会变得更加困难.虽然我喜欢@konsolebox 提供的使用分隔符的解决方案,但如果您的键空间包含像 "p|q" 这样的键,它最终会失败.它确实有一个很好的好处,因为您可以对键进行透明的操作,例如在 array[abc|def] 中查找 array[ 中的键 defabc],非常清晰易读.因为它依赖于未出现在键中的分隔符,所以只有当您知道键空间现在和将来所有代码使用时的样子时,这才是一个很好的方法.当您严格控制数据时,这只是一个安全的假设.

Not that that's bad: it may be exactly what you want, but especially when you don't control the keys into your array, doing it properly becomes harder. Although I like the solution given by @konsolebox of using a delimiter, it ultimately falls over if your keyspace includes keys like "p|q". It does have a nice benefit in that you can operate transparently on your keys, as in array[abc|def] to look up the key def in array[abc], which is very clear and readable. Because it relies on the delimiter not appearing in the keys, this is only a good approach when you know what the keyspace looks like now and in all future uses of the code. This is only a safe assumption when you have strict control over the data.

如果您需要任何类型的健壮性,我建议您连接数组键的哈希值.这是一种简单的技术,极有可能消除冲突,尽管如果您对精心制作的数据进行操作,它们是可能的.

If you need any kind of robustness, I would recommend concatenating hashes of your array keys. This is a simple technique that is extremely likely to eliminate conflicts, although they are possible if you are operating on extremely carefully crafted data.

为了借鉴 Git 处理散列的方式,让我们将键的 sha512sums 的前 8 个字符作为我们的散列键.如果您对此感到紧张,您可以随时使用整个 sha512sum,因为 sha512 没有已知的冲突.使用整个校验和确保您是安全的,但它有点麻烦.

To borrow a bit from how Git handles hashes, let's take the first 8 characters of the sha512sums of keys as our hashed keys. If you feel nervous about this, you can always use the whole sha512sum, since there are no known collisions for sha512. Using the whole checksum makes sure that you are safe, but it is a little bit more burdensome.

所以,如果我想要在 array[abc][def] 中存储元素的语义,我应该做的是将值存储在 array["$(keyhash "abc")$(keyhash "def")"] 其中 keyhash 看起来像这样:

So, if I want the semantics of storing an element in array[abc][def] what I should do is store the value in array["$(keyhash "abc")$(keyhash "def")"] where keyhash looks like this:

function keyhash () {
    echo "$1" | sha512sum | cut -c-8
}

然后您可以使用相同的 keyhash 函数提取关联数组的元素.有趣的是,您可以编写一个记忆化版本的 keyhash,它使用一个数组来存储哈希值,防止对 sha512sum 的额外调用,但是如果脚本需要多个键,它会在内存方面变得昂贵:

You can then pull out the elements of the associative array using the same keyhash function. Funnily, there's a memoized version of keyhash you can write which uses an array to store the hashes, preventing extra calls to sha512sum, but it gets expensive in terms of memory if the script takes many keys:

declare -A keyhash_array
function keyhash () {
    if [ "${keyhash_array["$1"]}" == "" ];
    then
        keyhash_array["$1"]="$(echo "$1" | sha512sum | cut -c-8)"
    fi
    echo "${keyhash_array["$1"]}"
}

对给定键的长度检查告诉我它查看数组的深度有多少层,因为那只是 len/8,我可以通过列出来查看嵌套数组"的子键键并修剪那些具有正确前缀的键.所以如果我想要 array[abc] 中的所有键,我真正应该做的是:

A length inspection on a given key tells me how many layers deep it looks into the array, since that's just len/8, and I can see the subkeys for a "nested array" by listing keys and trimming those that have the correct prefix. So if I want all of the keys in array[abc], what I should really do is this:

for key in "${!array[@]}"
do
    if [[ "$key" == "$(keyhash "abc")"* ]];
    then
        # do stuff with "$key" since it's a key directly into the array
        :
    fi
done

有趣的是,这也意味着第一级键是有效的并且可以包含值.所以,array["$(keyhash "abc")"] 是完全有效的,这意味着这个嵌套数组"结构可以有一些有趣的语义.

Interestingly, this also means that first level keys are valid and can contain values. So, array["$(keyhash "abc")"] is completely valid, which means this "nested array" construction can have some interesting semantics.

以一种或另一种形式,Bash 中嵌套数组的任何解决方案都采用了完全相同的技巧:生成一个(希望是单射的)映射函数 f(key,subkey) 生成字符串,它们可以用作数组键.这总是可以进一步应用为 f(f(key,subkey),subsubkey) 或者,在上面的 keyhash 函数的情况下,我更喜欢定义 f(key) 并应用于子键为 concat(f(key),f(subkey))concat(f(key),f(subkey),f(子键)) .结合 f 的记忆化,这会更有效率.在分隔符解决方案的情况下,f的嵌套应用当然是必要的.

In one form or another, any solution for nested arrays in Bash is pulling this exact same trick: produce a (hopefully injective) mapping function f(key,subkey) which produces strings that they can be used as array keys. This can always be applied further as f(f(key,subkey),subsubkey) or, in the case of the keyhash function above, I prefer to define f(key) and apply to subkeys as concat(f(key),f(subkey)) and concat(f(key),f(subkey),f(subsubkey)). In combination with memoization for f, this is a lot more efficient. In the case of the delimiter solution, nested applications of f are necessary, of course.

知道了这一点,我所知道的最佳解决方案是对 keysubkey 值进行简短的散列.

With that known, the best solution that I know of is to take a short hash of the key and subkey values.

我认识到人们普遍不喜欢你做错了,使用其他工具!"类型的答案.但是 bash 中的关联数组在许多层面上都是混乱的,当您尝试将代码移植到一个平台上时会遇到麻烦(出于某种愚蠢的原因)没有 bash 或者有一个古老的(pre-4.x) 版本.如果您愿意研究另一种语言来满足您的脚本需求,我建议您使用 awk.

I recognize that there's a general dislike for answers of the type "You're doing it wrong, use this other tool!" but associative arrays in bash are messy on numerous levels, and run you into trouble when you try to port code to a platform that (for some silly reason or another) doesn't have bash on it, or has an ancient (pre-4.x) version. If you are willing to look into another language for your scripting needs, I'd recommend picking up some awk.

它提供了 shell 脚本的简单性和更多功能丰富的语言带来的灵活性.我认为这是个好主意有几个原因:

It provides the simplicity of shell scripting with the flexibility that comes with more feature rich languages. There are a few reasons I think this is a good idea:

  • GNU awk(最流行的变体)具有完全成熟的关联数组,可以正确嵌套,并具有 array[key][subkey]
  • 的直观语法
  • 您可以将 awk 嵌入到 shell 脚本中,因此您仍然可以在真正需要时获得 shell 的工具
  • awk 有时非常简单,这使它与 Perl 和 Python 等其他 shell 替换语言形成鲜明对比

这并不是说 awk 没有缺点.当您第一次学习它时可能很难理解,因为它主要面向流处理(很像 sed),但对于许多几乎超出 shell 范围的任务来说,它是一个很好的工具.

That's not to say that awk is without its failings. It can be hard to understand when you're first learning it because it's heavily oriented towards stream processing (a lot like sed), but it's a great tool for a lot of tasks that are just barely outside of the scope of the shell.

请注意,上面我说GNU awk"(gawk)具有多维数组.其他 awk 实际上使用定义良好的分隔符 SUBSEP 来分隔键.您可以自己执行此操作,就像 bash 中的 array[a|b] 解决方案一样,但是如果您执行 array[key,subkey],nawk 已内置此功能.它仍然比 bash 的数组语法更加流畅和清晰.

Note that above I said that "GNU awk" (gawk) has multidimensional arrays. Other awks actually do the trick of separating keys with a well-defined separator, SUBSEP. You can do this yourself, as with the array[a|b] solution in bash, but nawk has this feature builtin if you do array[key,subkey]. It's still a bit more fluid and clear than bash's array syntax.

这篇关于bash 中的嵌套关​​联数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆