如何从Bash脚本中的字符串中删除重复的单词? [英] How to remove duplicate words from a string in a Bash script?
问题描述
我有一个包含重复单词的字符串,例如:
I have a string containing duplicate words, for example:
abc, def, abc, def
如何删除重复项?我需要的字符串是:
How can I remove the duplicates? The string that I need is:
abc, def
推荐答案
我们有以下测试文件:
$ cat file
abc, def, abc, def
要删除重复的单词,请执行以下操作:
To remove duplicate words:
$ sed -r ':a; s/\b([[:alnum:]]+)\b(.*)\b\1\b/\1\2/g; ta; s/(, )+/, /g; s/, *$//' file
abc, def
工作原理
-
:a
这定义了标签
a
.s/\ b([[:alnum:]] +)\ b(.*)\ b \ 1 \ b/\ 1 \ 2/g
这将查找由字母数字字符组成的重复单词,并删除第二个出现的单词.
This looks for a duplicated word consisting of alphanumeric characters and removes the second occurrence.
ta
如果最后一个替换命令导致更改,则会跳回到标签
a
再次尝试.If the last substitution command resulted in a change, this jumps back to label
a
to try again.通过这种方式,代码会一直寻找重复项,直到没有重复为止.
In this way, the code keeps looking for duplicates until none remain.
s/(,)+/,/g;s/,* $//
这两个替换命令清除了所有剩余的逗号空间组合.
These two substitution commands clean up any left over comma-space combinations.
对于Mac OSX或其他BSD系统,请尝试:
For Mac OSX or other BSD system, try:
sed -E -e ':a' -e 's/\b([[:alnum:]]+)\b(.*)\b\1\b/\1\2/g' -e 'ta' -e 's/(, )+/, /g' -e 's/, *$//' file
使用字符串而不是文件
sed可以轻松处理来自文件的输入(如上所示)或来自shell字符串的输入,如下所示:
Using a string instead of a file
sed easily handles input either from a file, as shown above, or from a shell string as shown below:
$ echo 'ab, cd, cd, ab, ef' | sed -r ':a; s/\b([[:alnum:]]+)\b(.*)\b\1\b/\1\2/g; ta; s/(, )+/, /g; s/, *$//' ab, cd, ef
这篇关于如何从Bash脚本中的字符串中删除重复的单词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!