后端为自动完成 [英] Back-end for Auto-complete
问题描述
这是一个面试问题:设计一个分布式后端的自动完成
This is an interview question: design a distributed back-end for auto-complete.
我会回答它,如下所示:
I would answer it as follows:
自动完成是由给定后缀在词典中搜索。这本字典应可能组织成一个的线索的。这本字典是从最频繁的查询建造,但它是另一回事。
Auto-complete is a search in a dictionary by a given suffix. The dictionary should be probably organized as a trie. The dictionary is built from the most frequent queries but it's another story.
现在我假设字典不经常改变(例如一天一次,而不是每毫秒)。因此,我们可以只复制在多个服务器处理自动完成查询(例如使用负载均衡和循环策略)的字典。
Now I assume the dictionary is not changed frequently (e.g. once a day rather than every millisecond). Thus we can just replicate the dictionary across a number of servers that handle auto-complete queries (e.g. with a load balancer and round-robin policy).
我们也应该思考一下字典,而是这也是另外一个故事了。
We should also think about dictionary but this is also another story.
是否有意义?我失去了一些东西?
Does it make sense? Am I missing something?
推荐答案
看看什么 SOLR 4.0 (Solr的有线索的和分布)。 它高度依赖于他们如何指望自动完成工作。如果它只是一个<一个href="http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ReversedWildcardFilterFactory"相对=nofollow>外卡过滤不是有点像特里将是罚款,简单的ASCII ...否则其变得更为复杂,如果他们想要自动校正。话虽这么说,我怀疑一个线索将让你很好的效果,如果它是一个通用的领域(即不是一个SKU或专门的ID),否则你将有一个骇人听闻的大和效率低下的线索。
Take a look at what SOLR 4.0 (solr has trie's and is distributed). Its highly dependent on how they expect the autocomplete to work. If its just a wild card filter than something like a trie will be fine for simple ASCII... otherwise its gets more complicated if they want auto-correction. That being said I doubt a trie will get you good results if its a generic field (ie not a SKU or specialized ID) otherwise you will have a monstrously large and inefficient trie.
看看:
- 在具体看一下它的Suggester: http://wiki.apache.org/solr/Suggester
- 和Solr的分析仪: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
- Specifically look at its Suggester: http://wiki.apache.org/solr/Suggester
- And Solr's analyzers: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
- Even more specifically: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.EdgeNGramFilterFactory
这篇关于后端为自动完成的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!