Aho-Corasick整个单词的文字匹配?(Aho-Corasick text matching on whole words?)

15 IT屋

I'm using Aho-Corasick text matching and wonder if it could be altered to match terms instead of characters. In other words, I want the the terms to be the basis of matching rather than the characters. As an example:

Search query: "He",

Sentence: "Hello world",

Aho-Corasick will match "he" to the sentence "hello world" ending at index 2, but I would prefer to have no match. So, I mean by "terms" words rather than characters.


One way to do this would be to use Aho-Corasick as usual, then do a filtering step where you eliminate all false positives. For example, every time you find a match, you can confirm that the next and previous characters in the input are non-letter characters like spaces or punctuation. That way, you get the speed of the Aho-Corasick lookup, but only consider matches that appear as whole words in the text.

Hope this helps!



句子:" Hello world",

Aho-Corasick将" he"与以索引2结尾的句子" hello world"匹配,但我希望没有匹配项。因此,我的意思是"术语"而不是字符。




本文地址:IT屋 » Aho-Corasick整个单词的文字匹配?