自动完成服务器端实现 [英] Autocomplete server-side implementation

查看:13
本文介绍了自动完成服务器端实现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 html 输入框中为自动完成功能实现服务器端组件的快速有效方法是什么?

What is a fast and efficient way to implement the server-side component for an autocomplete feature in an html input box?

我正在编写一个服务来在我们的 Web 界面的主搜索框中自动完成用户查询,并且完成显示在一个 ajax 驱动的下拉列表中.我们运行查询的数据只是我们系统知道的一个大概念表,它与维基百科页面标题集大致匹配.对于这项服务,速度显然是最重要的,因为网页的响应能力对用户体验很重要.

I am writing a service to autocomplete user queries in our web interface's main search box, and the completions are displayed in an ajax-powered dropdown. The data we are running queries against is simply a large table of concepts our system knows about, which matches roughly with the set of wikipedia page titles. For this service obviously speed is of utmost importance, as responsiveness of the web page is important to the user experience.

当前的实现只是将所有概念以有序集合的形式加载到内存中,并对用户击键执行简单的 log(n) 查找.然后使用尾集提供最接近匹配之外的其他匹配.此解决方案的问题在于它无法扩展.它目前运行时遇到了 VM 堆空间限制(我设置了 -Xmx2g,这是我们可以在 32 位机器上推送的最大数量),这阻止了我们扩展我们的概念表或添加更多功能.在具有更多内存的机器上切换到 64 位虚拟机并不是一个直接的选择.

The current implementation simply loads all concepts into memory in a sorted set, and performs a simple log(n) lookup on a user keystroke. The tailset is then used to provide additional matches beyond the closest match. The problem with this solution is that it does not scale. It currently is running up against the VM heap space limit (I've set -Xmx2g, which is about the most we can push on our 32 bit machines), and this prevents us from expanding our concept table or adding more functionality. Switching to 64-bit VMs on machines with more memory isn't an immediate option.

我一直在犹豫是否开始研究基于磁盘的解决方案,因为我担心磁盘寻道时间会降低性能.是否有可能的解决方案可以让我更好地扩展,无论是完全在内存中还是通过一些快速的磁盘支持实现?

I've been hesitant to start working on a disk-based solution as I am concerned that disk seek time will kill performance. Are there possible solutions that will let me scale better, either entirely in memory or with some fast disk-backed implementations?

@Gandalf:对于我们的用例,重要的是自动完成是全面的,而不仅仅是对用户的额外帮助.至于我们正在完成的,它是一个概念类型对的列表.例如,可能的条目是 [("Microsoft", "Software Company"), ("Jeff Atwood", "Programmer"), ("StackOverflow.com", "Website")].一旦用户从自动完成列表中选择了一个项目,我们就会使用 Lucene 进行完整搜索,但我不确定 Lucene 是否适用于自动完成本身.

@Gandalf: For our use case it is important the the autocompletion is comprehensive and isn't just extra help for the user. As for what we are completing, it is a list of concept-type pairs. For example, possible entries are [("Microsoft", "Software Company"), ("Jeff Atwood", "Programmer"), ("StackOverflow.com", "Website")]. We are using Lucene for the full search once a user selects an item from the autocomplete list, but I am not yet sure Lucene would work well for the autocomplete itself.

@Glen:这里没有使用数据库.当我谈论表格时,我只是指我的数据的结构化表示.

@Glen: No databases are being used here. When I'm talking about a table I just mean the structured representation of my data.

@Jason Day:我对这个问题的最初实现是使用 Trie,但是由于需要大量对象引用,内存膨胀实际上比排序集更糟糕.我将阅读三元搜索树,看看它是否有用.

@Jason Day: My original implementation to this problem was to use a Trie, but the memory bloat with that was actually worse than the sorted set due to needing a large number of object references. I'll read on the ternary search trees to see if it could be of use.

推荐答案

对于这么大的集合,我会尝试使用 Lucene 索引之类的方法来查找您想要的术语,并设置一个在每次击键后重置的计时器任务,0.5 秒的延迟.这样,如果用户快速键入多个字符,它不会在每次笔画时查询索引,只有在用户暂停一秒钟时才会查询.可用性测试会让你知道暂停应该多长时间.

With a set that large I would try something like a Lucene index to find the terms you want, and set a timer task that gets reset after every key stroke, with a .5 second delay. This way if a user types multiple characters fast it doesn't query the index every stroke, only when the user pauses for a second. Useability testing will let you know how long that pause should be.

Timer findQuery = new Timer();
...
public void keyStrokeDetected(..) {
   findQuery.cancel();
   findQuery = new Timer();
   String text = widget.getEnteredText();
   final TimerTask task = new TimerTask() {
      public void run() {
         ...query Lucene Index for matches
      }
   };
   findQuery.schedule(task, 350); //350 ms delay
}

那里有一些伪代码,但这就是想法.此外,如果设置了查询条件,则可以预先创建和优化 Lucene 索引.

Some pseduocode there, but that's the idea. Also if the query terms are set the Lucene Index can be pre-created and optimized.

这篇关于自动完成服务器端实现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆