弹性搜索-令牌化和多重匹配查询 [英] Elastic Search - Tokenization and Multi Match query

查看:89
本文介绍了弹性搜索-令牌化和多重匹配查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在Elastic Search中的单个查询中执行令牌化和多重匹配。

I need to perform tokenization and multi match in a single query in Elastic Search.

当前,
1)我正在使用分析器来获取令牌,如下所示

Currently, 1)I am using the analyzer to get the tokens like below

 String text = // 4 line log data;
 List<AnalyzeToken> analyzeTokenList = new ArrayList<AnalyzeToken>();
    AnalyzeRequestBuilder analyzeRequestBuilder = this.client.admin().indices().prepareAnalyze();
            for (String newIndex : newIndexes) {
                analyzeRequestBuilder.setIndex(newIndex);
                analyzeRequestBuilder.setText(text);
                analyzeRequestBuilder.setAnalyzer(analyzer);
                Response analyzeResponse = analyzeRequestBuilder.get();
                analyzeTokenList.addAll(analyzeResponse.getTokens());
            }

然后,我将遍历AnalyzeToken并获取令牌列表,

then, I will iterate through the AnalyzeToken and get the list of tokens,

List<String> tokens = new ArrayList<String>();
for (AnalyzeToken token : tokens)
         {
             tokens.addAll(token.getTerm().replaceAll("\\s+"," "));
         }

然后使用标记并像下面这样构造多重匹配查询,

then use the tokens and frame the multi-match query like below,

String query = "";
for(string data : tokens) {
   query = query + data;
}

     MultiMatchQueryBuilder multiMatchQueryBuilder = new MultiMatchQueryBuilder(query, "abstract", "title");
    Iterable<Document> result = documentRepository.search(multiMatchQueryBuilder);

基于结果,我正在检查数据库中是否存在类似的数据。

Based on the result, I am checking whether similar data exists in the database.

是否可以合并为单个查询-将分析和多匹配查询合并为单个查询?
可以提供任何帮助!

Is it possible to combine as single query - the analyze and multi match query as single query? Any help is appreciated!

编辑:
问题陈述:
假设我在一个索引中有90个条目,其中该索引中的每10个条目都是相同的(不完全相同,但匹配度为70%),所以我将有9对。
每对中只需要处理一个条目,所以我采用了以下方法(这不是好方法-但到目前为止,我最终还是采用了这种方法)

EDIT : Problem Statement : Say I have 90 entries in one index, In which each 10 entries in that index are identical (not exactly but will have 70% match) so I will have 9 pairs. I need to process only one entry in each pair, so I went in the following approach (which is not the good way - but as of now I end up with this approach)

方法


  1. 从索引的90个条目中获取每个条目

  2. 使用分析器进行标记化(删除不需要的关键字)

  3. 在同一索引中搜索(它检查索引中是否存在相同类型的数据),并过滤标记为已处理。 ->

  4. 如果没有标记可用于处理类似类型的数据(70%匹配),那么我将处理这些日志并更新当前日志标志为已处理。

  5. 如果已经存在任何数据,并且该标志为已处理,那么我将认为此数据已被处理,将继续处理下一个。

  1. Get each entry from the 90 entries in the index
  2. Tokenize using the analyzer (this removes the unwanted keywords)
  3. Search in the same index (It checks whether the same kind of data is there in the index) and also filters the flag as processed. --> this flag will be updated after the first log gets processed.
  4. If there is no flag available as processed for the similar kind of data (70% match) then I will process these logs and update the current log flag as processed.
  5. If any data already exist with the flag as processed then I will consider this data is already processed and I will continue with the next one.

所以理想的目标是只处理10个唯一条目中的一个数据。

So Ideal goal is to, process only one data in the 10 unique entries.

谢谢,

Harry

Thanks,
Harry

推荐答案

多重匹配查询在内部使用匹配查询进行分析,这意味着它们将应用在

Multi-match queries internally uses the match queries which are analyzed means they apply the same analyzer which is defined in the fields mapping(standard) if there is no analyzer defined.

来自多重匹配查询文档


multi_match查询建立在match查询的基础上,以允许多字段
查询:

The multi_match query builds on the match query to allow multi-field queries:

此外,接受分析器,提升,算子,minimum_should_match,
的模糊性,宽容度,如匹配查询中所述。

Also, accepts analyzer, boost, operator, minimum_should_match, fuzziness, lenient, as explained in match query.

因此,即使您愿意,您尝试执行的操作也不过分更改分析器(在搜索期间需要不同的令牌),则可以使用搜索分析器,而不是先创建令牌,然后在多重匹配查询中使用它们。

So what you are trying to do is overkill, even if you want to change the analyzer(need different tokens during search time) then you can use the search analyzer instead of creating tokens and then using them in multi-match query.

这篇关于弹性搜索-令牌化和多重匹配查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆