使用awk删除bash字符串中的重复项 [英] Removing duplicates in bash string using awk

查看:164
本文介绍了使用awk删除bash字符串中的重复项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试应用此处提出的方法{当我发现变量未按预期工作时,使用awk删除变量上的重复项而无需排序},从而使用awk删除字符串中的重复项.

I was trying to apply the method proposed here {Removing duplicates on a variable without sorting} to remove duplicates in a string using awk when I noticed it was not working as expected.

例如,假设我们有:

s="apple apple tree appleapple tree"

删除重复项,我们期望得到以下输出:

Removing duplicates we expect the following output:

apple tree appleaplle

应通过将以下命令应用于字符串来获得(链接中的完整说明):

which should be obtained by applying the following command to the string (complete explanation in the link):

awk 'BEGIN{RS=" "; ORS=" "}{ if(a[$0] == 0){a[$0]+=1; print $0}}' <<< $s

它使用关联数组,因此我们不希望将同一条记录打印两次.但是,按照这种方法,我得到了

It uses associative array, thus we do not expect to print twice the same record. However, following this method I get this

 apple tree appleapple tree

根据需要擦除了第一个apple重复项,但没有删除最后一个. 实际上,如果我们打印每条记录的长度,我们会看到最后一条记录不是tree而是tree +返回字符(我想是).

This first apple duplicate was erased as desired, but not the last one. In fact, if we print the length of each record we see that the last record is not tree but tree+ return character (I suppose so).

$ awk 'BEGIN{RS=" "; ORS=" "}{ print length($0); print $0}' <<< $s
$ 5 apple 5 apple 4 tree 10 appleapple 5 tree

请注意,最后一棵树的确是5个字符而不是4个字符,从而破坏了关联数组方法.

Notice that last tree is indeed 5 characters and not 4, resulting in breaking the associative array method.

我不明白为什么会有这个角色,它是从哪里来的? 以及如何解决此问题以使用此方法删除重复项?

I do not understand why there is this character and where does it come from? And how to solve this issue to remove duplicates using this method?

非常感谢您的任何建议

推荐答案

如果您不需要保持单词顺序:

If you don't need to maintain the word order:

$ ( set -f; printf "%s\n" $s | sort -u | paste -sd" " )
apple appleapple tree

如果您想保留订单:

$ awk '                                                                                                      
    {          
        delete seen
        sep=""
        for (i=1; i<=NF; i++) {
            if (!seen[$i]++) {
                printf "%s%s", sep, $i
            }
            sep=OFS
        }
        print ""
    }
' <<<"$s"
apple tree appleapple

这篇关于使用awk删除bash字符串中的重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆