自动完成性能和私人"魔术搜索" [英] Autocomplete performance and private "magic search"

查看:150
本文介绍了自动完成性能和私人"魔术搜索"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

完成域性能不佳降低其效用。如果客户端的实现必须调用,它重分贝查找一个端点,响应时间就可以轻松搞定令人沮丧。

Poor performance of autocomplete fields reduces their usefulness. If the client-side implementation has to call an endpoint that does heavy db lookup, the response time can easily get frustrating.

一个绝妙的办法来自 AWS案例研究:IMDB 。它用于配图(不再可用),但简而言之一个prediction树,就可以产生并存储,可以以有意义的方式解决每个组合。例如。为 STA 的决议将包括星球大战,星际迷航,史泰龙将被储存,但 STB 不会解决任何事情有意义,也不会被保存。

One neat approach comes from AWS Case Study: IMDb. It used to come with a diagram (no longer available), but in a nutshell a prediction tree would be generated and stored for every combination that can resolve in a meaningful way. E.g. resolutions for sta would include Star Wars, Star Trek, Sylvester Stallone which will be stored, but stb will not resolve to anything meaningful and will not be stored.

要获得尽可能低的延迟,所有可能的结果是
  与文档pre-计算了字母每个组合
  搜索。每个文档都推到亚马逊的简单存储服务
  (亚马逊S3),从而亚马逊CloudFront的,把证件
  物理上接近的使用者。的可能的理论数
  搜索计算是令人难以置信的,有20个字符的搜索有27 Z
  1030的组合,但在实践中,对使用IMDB电影的权威和
  名人的数据可以搜索空间减少至约150,000文件,
  其中Amazon S3和CloudFront的分发可以在短短的几
  小时。 IMDB在每日更新几种语言创建索引
  对于超过10万影视标题数据集和名人的名字。

To get the lowest possible latency, all possible results are pre-calculated with a document for every combination of letters in search. Each document is pushed to Amazon Simple Storage Service (Amazon S3) and thereby to Amazon CloudFront, putting the documents physically close to the users. The theoretical number of possible searches to calculate is mind-boggling—a 20-character search has 23 x 1030 combinations—but in practice, using IMDb's authority on movie and celebrity data can reduce the search space to about 150,000 documents, which Amazon S3 and Amazon CloudFront can distribute in just a few hours. IMDb creates indexes in several languages with daily updates for datasets of over 100,000 movie and TV titles and celebrity names.

一个人怎么会达到同样的高性能体验私人数据可以实现吗?例如。自动填充客户名称,作业ID,发票编号...存储不同的文档/单独用户决策树听起来昂贵,尤其是当一些数据(客户的名字?)可能是可供多个用户。

How would one achieve a similarly performant experience be achieved with private data? E.g. autocompleting client names, job ids, invoice numbers... Storing different documents/decision trees for separate users sounds expensive, especially if some of the data (client names?) could be available for multiple users.

推荐答案

您的权利,这样的工作量需要一些特殊的优化。

You right that such workload requires some special optimizations.

您可以使用现成的搜索引擎如的Apache Lucene的或的 Solr的(至极是Lucene的REST API包装)

You can use ready search engine like Apache lucene or Solr (wich is REST API wrapper for lucene)

该引擎全文搜索优化,可与私人数据。

This engine optimized for full text searches and can work with private data.

工作步骤:


  1. Solr的安装(或Lucene的)

  2. 用于存储信息架构设计(你需要什么领域,什么类型的搜索结果的)

  3. 将数据加载到它(通过巴赫操作或更新的基础上)

  4. 基于solrs查询语言(类似于谷歌搜索)查询搜索。
    在这个地方,你可以添加基于USER_ID或以上,除了原有的用户查询参数的特别限制。因此私人数据不会在用户之间的混乱。

这篇关于自动完成性能和私人"魔术搜索"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆