如何搜索引擎合并从倒排索引的结果? [英] How do search engines merge results from an inverted index?

查看:429
本文介绍了如何搜索引擎合并从倒排索引的结果?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何搜索引擎从倒排索引合并的结果?

How do search engines merge results from an inverted index?

例如,如果我搜索的单词狗和蝙蝠,将有两个巨大的其中包含两个词之一的每一份文件的列表。在倒排索引

For example, if I searched for the inverted indexes of the words "dog" and "bat", there would be two huge lists of every document which contained one of the two words.

我怀疑搜索引擎走过这些名单,一个文档的时间,并试图找到名单的结果匹配。什么是做算法,使这一合并过程极快的?

I doubt that a search engine walks through these lists, one document at a time, and tries to find matches with the results of the lists. What is done algorithmically to make this merging process blazing fast?

推荐答案

其实搜索引擎的的合并这些文件列表。他们获得良好的性能通过使用其它技术,其中最重要的是修剪:例如,对于每一个字的文档存储在顺序递减的PageRank的,并得到具有渐入头10的机会的结果(这将显示给用户),您可以遍历狗和蝙蝠名单只是一个相当小的部分,比如说,第一千。 (而且,当然,有高速缓存,但是这不涉及到非常查询执行算法)

Actually search engines do merge these document lists. They gain good performance by using other techniques, the most important of which is pruning: for example, for every word the documents are stored in order of decreasing pagerank, and to get results that have a chance of getting into the first 10 (which will be shown to the user) you may traverse just a fairly small portion of the dog and bat lists, say, the first thousand. (and, of course, there's caching, but that's not related to the very query execution algorithm)

此外,毕竟,没有的的关于狗和蝙蝠左右许多文件:即使是数以百万计,它变成双秒以良好的实施

Besides, after all, there are not that many documents about dogs and about bats: even if it's millions, it turns into split seconds with a good implementation.

P.S。我曾在我国领先的搜索引擎,但是,不是在我们的旗舰搜索产品非常引擎,但我​​跟它的开发,惊讶地知道,查询执行的算法实际上是非常愚蠢的:事实证明,人们可以壁球一巨大的计算量到可以接受的时间范围。这完全是非常优化的,当然,但没有魔法,没有奇迹。

P.S. I worked at our country's leading search engine, however, not in the very engine of our flagship search product, but I talked to its developers and was surprised to know that query execution algorithms are actually fairly dumb: it turns out that one may squash a huge amount of computation into acceptable time bounds. It is all very optimized of course, but there's no magic and no miracles.

这篇关于如何搜索引擎合并从倒排索引的结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆