使用 sed 删除一行中的重复单词 [英] Remove duplicate words in a line with sed
问题描述
纯粹是学术性的,但这让我很沮丧.
Purely academic, but it's frustrating me.
我想更正这段文字:
there there are are multiple lexical errors in this line line
使用sed.我已经走了这么远:
using sed. I've got this far:
sed 's/\([a-z][a-z]*[ ,\n][ ,\n]*\)\1/\1/g' < file.text
它纠正了除了最后一个双倍单词之外的所有内容!
It corrects everything except the final doubled up words!
there are multiple lexical errors in this line line
sed 高手能解释一下为什么上面没有处理最后的单词吗?
Can a sed guru please explain why the above doesn't deal with the words at the end?
推荐答案
这是因为在最后一种情况 (line
) 中,您的正则表达式内存 1 将有 line
(line后跟一个空格),您正在搜索它的重复.由于最后一个 line
之后没有空格,匹配失败.
This is because in the last case (line
) your regex memory 1 will have line
(line followed by a space) in it and you are searching for its repetition. Since there is not space after the last line
the match fails.
要解决此问题,请在结束词 line
后添加一个空格.
To fix this add a space after the ending word line
.
或者,您可以将正则表达式更改为:
Alternatively you can change the regex to:
sed -e 's/\b\([a-z]\+\)[ ,\n]\1/\1/g'
这篇关于使用 sed 删除一行中的重复单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!