自动完成服务器端执行 [英] Autocomplete server-side implementation

查看:176
本文介绍了自动完成服务器端执行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

什么是落实在HTML的输入框自动完成功能的服务器端组件快速而有效的方式?

What is a fast and efficient way to implement the server-side component for an autocomplete feature in an html input box?

我写一个服务来自动完成用户查询在我们的网页界面的主搜索框,完井显示在一个Ajax支持的下拉列表。我们正在运行的查询对数据简直是一大桌子的概念,我们的系统知道,这大致与设定的维基百科页面标题匹配。此服务明显速度是非常重要的,因为在网页的响应是对用户体验的重要。

I am writing a service to autocomplete user queries in our web interface's main search box, and the completions are displayed in an ajax-powered dropdown. The data we are running queries against is simply a large table of concepts our system knows about, which matches roughly with the set of wikipedia page titles. For this service obviously speed is of utmost importance, as responsiveness of the web page is important to the user experience.

目前的实现只加载在一个有序集合所有概念到内存中,并执行上的用户击键的简单的log(n)查询。然后tailset用于提供超越最接近的匹配的附加的匹配。这种解决方案的问题是,它不能扩展。目前,它是针对虚拟机的堆空间极限跑起来(我已经设置-Xmx2g,这是关于我们可以把我们的32位机器最),这$ P $从扩大我们的概念表或添加更多的功能pvents我们。切换到64位的虚拟机上有更多内存的机器是不是立即选择。

The current implementation simply loads all concepts into memory in a sorted set, and performs a simple log(n) lookup on a user keystroke. The tailset is then used to provide additional matches beyond the closest match. The problem with this solution is that it does not scale. It currently is running up against the VM heap space limit (I've set -Xmx2g, which is about the most we can push on our 32 bit machines), and this prevents us from expanding our concept table or adding more functionality. Switching to 64-bit VMs on machines with more memory isn't an immediate option.

我一直犹豫,开始在基于磁盘的解决方案,工作,我很担心,磁盘寻道时间会杀了性能。是否有可能的解决方案,可以让我变得更好,无论是完全在内存或一些快速的磁盘备份的实现?

I've been hesitant to start working on a disk-based solution as I am concerned that disk seek time will kill performance. Are there possible solutions that will let me scale better, either entirely in memory or with some fast disk-backed implementations?

编辑:

@Gandalf:对于我们的用例是非常重要的自动完成为com prehensive而不是用户只要额外的帮助。至于我们所完成,它是概念型对的列表。例如,可能的条目是[(微软,软件公司),(杰夫·阿特伍德,程序员),(StackOverflow.com,网站)。我们使用Lucene的完整的搜索,一旦用户选择从自动完成列表中的项目,但我还不确定的Lucene会为自动完成自身工作。

@Gandalf: For our use case it is important the the autocompletion is comprehensive and isn't just extra help for the user. As for what we are completing, it is a list of concept-type pairs. For example, possible entries are [("Microsoft", "Software Company"), ("Jeff Atwood", "Programmer"), ("StackOverflow.com", "Website")]. We are using Lucene for the full search once a user selects an item from the autocomplete list, but I am not yet sure Lucene would work well for the autocomplete itself.

@Glen:未有数据库被用在这里。当我在谈论一个表我只是说我的数据的结构重新presentation。

@Glen: No databases are being used here. When I'm talking about a table I just mean the structured representation of my data.

@Jason天:我原来的执行对这个问题是使用特里,但与内存膨胀,由于需要大量的对象引用,事实上比有序集合差。我读了三元搜索树,看看它是否可以使用。

@Jason Day: My original implementation to this problem was to use a Trie, but the memory bloat with that was actually worse than the sorted set due to needing a large number of object references. I'll read on the ternary search trees to see if it could be of use.

推荐答案

通过一组大型我会尝试类似Lucene索引找到你想要的条件,并设置得到每一个关键中风后复位定时器任务,用0.5秒延迟。这样,如果用户键入多个字符之快,不查询索引每一个行程,只有当用户暂停一秒钟。使用性测试将让你知道多长时间的停顿应该是。

With a set that large I would try something like a Lucene index to find the terms you want, and set a timer task that gets reset after every key stroke, with a .5 second delay. This way if a user types multiple characters fast it doesn't query the index every stroke, only when the user pauses for a second. Useability testing will let you know how long that pause should be.

Timer findQuery = new Timer();
...
public void keyStrokeDetected(..) {
   findQuery.cancel();
   findQuery = new Timer();
   String text = widget.getEnteredText();
   final TimerTask task = new TimerTask() {
      public void run() {
         ...query Lucene Index for matches
      }
   };
   findQuery.schedule(task, 350); //350 ms delay
}

有些pseduo code那里,但是这是想法。此外,如果查询词都设置Lucene索引可以pre-创建和优化。

Some pseduocode there, but that's the idea. Also if the query terms are set the Lucene Index can be pre-created and optimized.

这篇关于自动完成服务器端执行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆