后端为自动完成 [英] Back-end for Auto-complete

查看:116
本文介绍了后端为自动完成的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是一个面试问题:设计一个分布式后端的自动完成

This is an interview question: design a distributed back-end for auto-complete.

我会回答它,如下所示:

I would answer it as follows:

自动完成是由给定后缀在词典中搜索。这本字典应可能组织成一个的线索的。这本字典是从最频繁的查询建造,但它是另一回事。

Auto-complete is a search in a dictionary by a given suffix. The dictionary should be probably organized as a trie. The dictionary is built from the most frequent queries but it's another story.

现在我假设字典不经常改变(例如一天一次,而不是每毫秒)。因此,我们可以只复制在多个服务器处理自动完成查询(例如使用负载均衡和循环策略)的字典。

Now I assume the dictionary is not changed frequently (e.g. once a day rather than every millisecond). Thus we can just replicate the dictionary across a number of servers that handle auto-complete queries (e.g. with a load balancer and round-robin policy).

我们也应该思考一下字典,而是这也是另外一个故事了。

We should also think about dictionary but this is also another story.

是否有意义?我失去了一些东西?

Does it make sense? Am I missing something?

推荐答案

看看什么 SOLR 4.0 (Solr的有线索的和分布)。 它高度依赖于他们如何指望自动完成工作。如果它只是一个<一个href="http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ReversedWildcardFilterFactory"相对=nofollow>外卡过滤不是有点像特里将是罚款,简单的ASCII ...否则其变得更为复杂,如果他们想要自动校正。话虽这么说,我怀疑一个线索将让你很好的效果,如果它是一个通用的领域(即不是一个SKU或专门的ID),否则你将有一个骇人听闻的大和效率低下的线索。

Take a look at what SOLR 4.0 (solr has trie's and is distributed). Its highly dependent on how they expect the autocomplete to work. If its just a wild card filter than something like a trie will be fine for simple ASCII... otherwise its gets more complicated if they want auto-correction. That being said I doubt a trie will get you good results if its a generic field (ie not a SKU or specialized ID) otherwise you will have a monstrously large and inefficient trie.

看看:

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆