是否有一个有效的算法来进行倒全文搜索? [英] Is there an efficient algorithm to perform inverted full text search?

查看:109
本文介绍了是否有一个有效的算法来进行倒全文搜索?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有数千个关键字(在每个关键字的一个或多个字)在数据库中的有限列表。我想高效地查找这些关键字都在给定的输入文字,而不必每次测试这些关键字一个接一个(全表扫描)的。允许以匹配文本一些拼写错误的单词会更好,但不是必需的。任何算法/条建议,以解决这个问题?

I have a limited list of thousands of keywords (of one or more words in each keyword) in a database. I want to efficiently find which of those keywords are in given input text without having to test each of those keywords one by one (full table scan). Allowing to match some misspelled words in the text would be even better but not essential. Any algorithm/article suggestions to solve this?

推荐答案

我觉得有些至今都误以为是被问这个问题的答案。我的理解是,你有话和文字的(相当大的)身体的(相当大的)名单。你想知道哪些字的两个的名单有一个共同点,就是正确的?

I think some of the answers so far are misunderstanding the problem that is being asked. My understanding is that you've got a (largish) list of words and a (largish) body of text. You want to know which words both lists have in common, is that correct?

如果是这样,这是不是一个真正的全文问题都没有。基本上,你只要有字的两个列表(你原来的关键字,词的输入文本列表)。如果你解决这两个列表,你可以通过这两个列表同时扫描并提取是常用的词。

If so, this isn't really a full-text problem at all. Basically, you just have two lists of words (your original keywords, and a list of words from the input text). If you sort both lists you can scan through both lists simultaneously and extract the words that are in common.

假设关键字的列表已经排序,你可以提取的O(N LOGN)时间从正文中排序的话,然后扫描通过这两个列表同时为O(N + M)(其中n是在文本和米的体字的数目是词语的关键字列表)的数

Assuming the list of keywords is already sorted, you can extract and sort the words from the body of text in O(n logn) time, and then scanning through both lists simultaneously is O(n+m) (where n is the number of words in the body of text and m is the number of words in the keyword list).

这篇关于是否有一个有效的算法来进行倒全文搜索?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆