Perl的:从数组关键词搜索文本文件 [英] Perl: Search text file for keywords from array

查看:194
本文介绍了Perl的:从数组关键词搜索文本文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我如何使用关键字从阵列中的一个正则表达式来搜索文件。

How do I use keywords from an array in an regex to search a files.

我想看看一个文本文件,看看是否与关键字出现在哪里。有两个文件keywords.txt

I'm trying to look at a text file and see if and where the keywords appear. There are two files keywords.txt

keyword.txt
word1
word2
word3

filestosearchon.txt
a lot of words that go on and one and contain linebreaks and linebreaks (up to 100000   characters)

我想找到的关键字和匹配的位置。这适用于一个字,但我无法弄清楚如何遍历对正则表达式的关键字。

I would like to find the keyword and the position of the match. This works for one word but I am unable to figure out how to iterate the keywords on the regex.

#!/usr/bin/perl

# open profanity list
open(FILE, "keywords.txt") or die("Unable to open file");
@keywords = <FILE>; 
close(FILE);

# open text file
local $/=undef; 
open(txt, "filetosearchon.txt") or die("Unable to open file");
$txt = <txt>;

$regex = "keyword";


push @section,[length($`),length($&),$1]    
while ($txt =~ m/$regex/g);

foreach $element(@section)  
{
print (join(", ",@$element), $regex, "\n");    
}

我如何可以遍历该数组中的关键字,在这个while循环得到匹配的关键字和位置?

How can I iterate the keywords from the array over this while loop to get the matched keywords and position?

鸭preciate anyhelp。谢谢

Appreciate anyhelp. Thanks

推荐答案

要做到这一点是只构建包含每一个字正则表达式的一种方式:

One way to do this would be to just build a regex containing every word:

(alpha|bravo|charlie|delta|echo|foxtrot|...|zulu)

Perl的正则表达式编译器是pretty聪明,会smoosh下来一样,因为它可以,所以正则表达式会比你想象的更高效。 看到这个答案由汤姆·克里斯琴森。例如,下面的正则表达式:

Perl's regex compiler is pretty smart and will smoosh this down as much as it can, so the regex will be more efficient than you think. See this answer by Tom Christiansen. For example the following regex:

(cat|rat|sat|mat)

将编译为:

(c|r|s|m)at

这是高效的运行。这种方法可能击败了寻找依次在每个关键词的方法,因为它只需要进行一次传过来的输入字符串;天真的方法要求每个关键字一遍你想搜索的内容。

Which is efficient to run. This approach probably beats the "search for each keyword in turn" approach because it only needs to make one pass over the input string; the naive approach requires one pass per keyword you want to search for.

顺便说一句;如果你正在构建一个亵渎过滤器时,您的样本code建议,记得要占故意错误拼写:'PRON','p0rn等<一个href=\"http://stackoverflow.com/questions/9491890/is-there-a-list-of-characters-that-look-similar-to-english-letters\">Then有你可以有统一code中的乐趣!

By the way; If you're building a profanity filter, as your sample code suggests, remember to account for intentional mis-spellings: 'pron', 'p0rn', etc. Then there's the fun you can have with Unicode!

这篇关于Perl的:从数组关键词搜索文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆