我如何使用grep / egrep在文件中找到重复的单词? [英] How can I find repeated words in a file using grep/egrep?

查看:245
本文介绍了我如何使用grep / egrep在文件中找到重复的单词?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在unix(bash)中使用egrep(或grep -e)在文件中找到重复的单词

I need to find repeated words in a file using egrep (or grep -e) in unix (bash)

我试过了:

I tried:

egrep "(\<[a-zA-Z]+\>) \1" file.txt

egrep "(\b[a-zA-Z]+\b) \1" file.txt



<但是由于某种原因,这些事情认为事情是不重复的!例如
,尽管字边界条件 \> \b,它认为字串单词符合条件。

推荐答案

\1 匹配无论字符串是否与第一次捕获相匹配。这与匹配第一次捕捉时匹配的模式不同。因此,尽管 \ b 位于捕获括号内,但第一次捕获在字边界上匹配的事实不再相关。

\1 matches whatever string was matched by the first capture. That is not the same as matching the same pattern as was matched by the first capture. So the fact that the first capture matched on a word boundary is no longer relevant, even though the \b is inside the capture parentheses.

如果您希望第二个实例也位于单词边界上,您需要这样说:

If you want the second instance to also be on a word boundary, you need to say so:

egrep "(\b[a-zA-Z]+) \1\b" file.txt



That is no different from:

egrep "\b([a-zA-Z]+) \1\b" file.txt

该模式强制字边界,所以我删除了多余的 \ b s。如果你想更加明确,你可以把它们放在:

The space in the pattern forces a word boundary, so I removed the redundant \bs. If you wanted to be more explicit, you could put them in:

egrep "\<([a-zA-Z]+)\> \<\1\>" file.txt

这篇关于我如何使用grep / egrep在文件中找到重复的单词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆