自动完成性能和私人“神奇搜索" [英] Autocomplete performance and private "magic search"

查看:27
本文介绍了自动完成性能和私人“神奇搜索"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

自动完成字段的性能不佳降低了它们的实用性.如果客户端实现必须调用执行大量数据库查找的端点,则响应时间很容易令人沮丧.

Poor performance of autocomplete fields reduces their usefulness. If the client-side implementation has to call an endpoint that does heavy db lookup, the response time can easily get frustrating.

一种巧妙的方法来自 AWS 案例研究:IMDb.它曾经带有一个图表(不再可用),但简而言之,将为每个可以以有意义的方式解析的组合生成并存储预测树.例如.sta 的解析将包括将被存储的 Star Wars、Star Trek、Sylvester Stallone,但 stb 不会解析为任何有意义的东西,也不会存储.

One neat approach comes from AWS Case Study: IMDb. It used to come with a diagram (no longer available), but in a nutshell a prediction tree would be generated and stored for every combination that can resolve in a meaningful way. E.g. resolutions for sta would include Star Wars, Star Trek, Sylvester Stallone which will be stored, but stb will not resolve to anything meaningful and will not be stored.

为了获得尽可能低的延迟,所有可能的结果都是为每个字母组合预先计算一个文档搜索.每个文档都被推送到 Amazon Simple Storage Service(Amazon S3),从而到 Amazon CloudFront,将文件放在物理上靠近用户.可能的理论数量要计算的搜索令人难以置信——20 个字符的搜索有 23 x1030 种组合——但在实践中,利用 IMDb 在电影和名人数据可以将搜索空间减少到大约 150,000 个文档,Amazon S3 和 Amazon CloudFront 可以分发到几个小时.IMDb 以多种语言创建索引,每日更新用于超过 100,000 部电影和电视节目以及名人姓名的数据集.

To get the lowest possible latency, all possible results are pre-calculated with a document for every combination of letters in search. Each document is pushed to Amazon Simple Storage Service (Amazon S3) and thereby to Amazon CloudFront, putting the documents physically close to the users. The theoretical number of possible searches to calculate is mind-boggling—a 20-character search has 23 x 1030 combinations—but in practice, using IMDb's authority on movie and celebrity data can reduce the search space to about 150,000 documents, which Amazon S3 and Amazon CloudFront can distribute in just a few hours. IMDb creates indexes in several languages with daily updates for datasets of over 100,000 movie and TV titles and celebrity names.

如何使用私有数据实现类似的性能体验?例如.自动完成客户名称、工作 ID、发票编号...为不同的用户存储不同的文档/决策树听起来很昂贵,尤其是如果某些数据(客户名称?)可供多个用户使用.

How would one achieve a similarly performant experience be achieved with private data? E.g. autocompleting client names, job ids, invoice numbers... Storing different documents/decision trees for separate users sounds expensive, especially if some of the data (client names?) could be available for multiple users.

推荐答案

你说得对,这样的工作负载需要一些特殊的优化.

You right that such workload requires some special optimizations.

您可以使用现成的搜索引擎,例如 Apache luceneSolr(这是 lucene 的 REST API 包装器)

You can use ready search engine like Apache lucene or Solr (wich is REST API wrapper for lucene)

该引擎针对全文搜索进行了优化,可以处理私人数据.

This engine optimized for full text searches and can work with private data.

工作步骤:

  1. 安装 solr(或 lucene)
  2. 用于存储信息的设计架构(您需要哪些字段和哪些类型的搜索)
  3. 将数据加载到其中(通过 bach 操作或基于更新)
  4. 基于 solrs 查询语言的查询搜索(类似于谷歌搜索).在这个地方,除了原始用户查询之外,您还可以根据 user_id 或任何 over 参数添加特殊限制.因此,私人数据不会在用户之间造成混乱.

这篇关于自动完成性能和私人“神奇搜索"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆