算法在文本多字匹配 [英] Algorithm for multiple word matching in text

查看:102
本文介绍了算法在文本多字匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大的组字(约10000),而我需要找到如果有这些词出现在文本中的给定块。

I have a large set of words (about 10,000) and I need to find if any of those words appear in a given block of text.

有没有更快的算法不是做一个简单的文本搜索每一个词语的文本块?

Is there a faster algorithm than doing a simple text search for each of the words in the block of text?

推荐答案

输入10000字到一个哈希表然后检查每一个词语的文本块,如果其散列有一个条目。

input the 10,000 words into a hashtable then check each of the words in the block of text if its hash has an entry.

更快,虽然我不知道,只是另一种方法(将取决于有多少话要搜索的)。

Faster though I don't know, just another method (would depend on how many words you are searching for).

简单的Perl examp:

simple perl examp:

my $word_block = "the guy went afk after being popped by a brownrabbit";
my %hash = ();
my @words = split /\s/, $word_block;
while(<DATA>) { chomp; $hash{$_} = 1; }
foreach $word (@words)
{
    print "found word: $word\n" if exists $hash{$word};
}

__DATA__
afk
lol
brownrabbit
popped
garbage
trash
sitdown

这篇关于算法在文本多字匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆