弹性搜索 - 使用“标签”索引,以发现给定字符串中的所有标签 [英] Elasticsearch - use a "tags" index to discover all tags in a given string

查看:104
本文介绍了弹性搜索 - 使用“标签”索引,以发现给定字符串中的所有标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有tags索引的elasticsearch v2.x集群,它包含大约5000个标签: {tagName,tagID} 。给定一个字符串,是否可以查询标签索引以获取在该字符串中找到的所有标签?我不仅需要精确匹配,而且我也希望能够控制模糊匹配而不是太慷慨。太慷慨的是,如果标签中的所有标记都在彼此的一定距离之内(比如说5个字),标签就应该匹配。



例如,给定字符串:

  22340型声谱分析仪

以下标签应该匹配:



声音分析器 声音 频谱 分析仪



但不是



声音计 光谱 化学分析仪

解决方案

我不认为可以创建精确的弹性搜索查询将自动标记一个随机字符串。这基本上是一个反向查询。将标签与文档进行匹配的最准确的方法是构建标签的查询,然后搜索文档。显然,如果您需要迭代每个标签以自动标记文档,那么这将非常低效。



要进行反向查询,您需要使用Elasticsearch Percolator API :



https://www.elastic.co/guide/en/elasticsearch/reference/current/search-percolate.html



API是非常灵活,允许您在具有多个字段的文档中创建相当复杂的查询。



基本概念是这样(假设您的标签具有应用程序特定的ID字段):


  1. 对于每个标签,为其创建一个查询,并使用渗滤器注册查询(使用标签的ID字段)。 p>


  2. 要自动标记一个字符串,请将您的字符串(作为文档)传递给Percolator,这将与所有注册的查询相匹配。


  3. 迭代比赛。每个匹配包含查询的_id。使用_id引用标签。


这也是一个很好的文章: https://www.elastic.co/blog/percolator-redesign-blog-post


I have an elasticsearch v2.x cluster with a "tags" index that contains about 5000 tags: {tagName, tagID}. Given a string, is it possible to query the tags index to get all tags that are found in that string? Not only do I want exact matches, but I also want to be able to control for fuzzy matches without being too generous. By too generous, a tag should only match if all tokens in the tag are found within a certain proximity of each other (say 5 words).

For example, given the string:

Model 22340 Sound Spectrum Analyzer

The following tags should match:

sound analyzer sound spectrum analyzer

BUT NOT

sound meter light spectrum chemical analyzer

解决方案

I don't think it's possible to create an accurate elasticsearch query that will auto-tag a random string. That's basically a reverse query. The most accurate way to match a tag to a document is to construct a query for the tag, and then search the document. Obviously this would be terribly inefficient if you need to iterate over each tag to auto-tag a document.

To do a reverse query, you want to use the Elasticsearch Percolator API:

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-percolate.html

The API is very flexible and allows you to create fairly complex queries into documents with multiple fields.

The basic concept is this (assuming your tags have an app specific ID field):

  1. For each tag, create a query for it, and register the query with the percolator (using the tag's ID field).

  2. To auto-tag a string, pass your string (as a document) to the Percolator, which will match it against all registered queries.

  3. Iterate over the matches. Each match includes the _id of the query. Use the _id to reference the tag.

This is also a good article to read: https://www.elastic.co/blog/percolator-redesign-blog-post

这篇关于弹性搜索 - 使用“标签”索引,以发现给定字符串中的所有标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆