正则表达式多词搜索 [英] Regex multi word search
问题描述
我用什么来搜索字符串中的多个单词?我希望逻辑运算是 AND 以便所有单词都在字符串中的某个地方.我有一堆无意义的段落和一个简单的英文段落,我想通过指定几个常用词来缩小范围,例如the"和and",但希望它与我指定的所有词相匹配.>
What do I use to search for multiple words in a string? I would like the logical operation to be AND so that all the words are in the string somewhere. I have a bunch of nonsense paragraphs and one plain English paragraph, and I'd like to narrow it down by specifying a couple common words like, "the" and "and", but would like it match all words I specify.
推荐答案
也许使用 识别英语的语言识别图表会起作用.一些快速测试似乎有效(假设段落仅由换行符分隔).
Maybe using a language recognition chart to recognize english would work. Some quick tests seem to work (this assumes paragraphs separated by newlines only).
正则表达式将匹配这些条件中的任何一个...... \bword\b 是由边界分隔的单词 word\b 是一个单词结尾,并且 word 将在要匹配的段落的任何位置匹配它.>
The regexp will match one of any of those conditions... \bword\b is word separated by boundaries word\b is a word ending and just word will match it in any place of the paragraph to be matched.
my @paragraphs = split(/\n/,$text);
for my $p (@paragraphs) {
if ($p =~ m/\bthe\b|\band\b|\ban\b|\bin\b|\bon\b|\bthat\b|\bis\b|\bare\b|th|sh|ough|augh|ing\b|tion\b|ed\b|age\b|’s\b|’ve\b|n’t\b|’d\b/) {
print "Probable english\n$p\n";
}
}
这篇关于正则表达式多词搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!