ElasticSearch-输入查询中不带(*)的JavaApi搜索不会发生 [英] ElasticSearch - JavaApi searching not happening without (*) in my input query

查看:175
本文介绍了ElasticSearch-输入查询中不带(*)的JavaApi搜索不会发生的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用Java api从弹性搜索中获取文档时,我的弹性搜索文档中包含以下代码,并尝试使用以下模式进行搜索。

Am fetching documents from elastic search using java api, i have the following code in my elastic search documents and am trying to search it with the following pattern.

代码:MS-VMA1615-0D

Input : *VMA1615-0*     -- Am getting the results (MS-VMA1615-0D).
Input : MS-VMA1615-0D   -- Am getting the results (MS-VMA1615-0D).
Input : *VMA1615-0      -- Am getting the results (MS-VMA1615-0D).
Input : *VMA*-0*        -- Am getting the results (MS-VMA1615-0D).

但是,如果我输入以下内容,则不会得到结果。

But, if i give input like below, am not getting results.

Input : VMA1615         -- Am not getting the results.

我希望返回代码 MS-VMA1615-0D

请在下面找到我正在使用的Java代码

Please find my below java code that am using

private final String INDEX = "products";
private final String TYPE = "doc";
SearchRequest searchRequest = new SearchRequest(INDEX); 
    searchRequest.types(TYPE);
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    QueryStringQueryBuilder qsQueryBuilder = new QueryStringQueryBuilder(code); 

    qsQueryBuilder.defaultField("code");
    searchSourceBuilder.query(qsQueryBuilder);

    searchSourceBuilder.size(50);
    searchRequest.source(searchSourceBuilder);
    SearchResponse searchResponse = null;
    try {
         searchResponse = SearchEngineClient.getInstance().search(searchRequest);
    } catch (IOException e) {
        e.getLocalizedMessage();
    }
    Item item = null;
    SearchHit[] searchHits = searchResponse.getHits().getHits();

请找到我的地图详细信息:

Please find my mapping details :

PUT products
{
"settings": {
"analysis": {
  "analyzer": {
    "custom_analyzer": {
      "type": "custom",
      "tokenizer": "whitespace",
      "char_filter": [
        "html_strip"
      ],
      "filter": [
        "lowercase",
        "asciifolding"
      ]
    }
   }
  }
},
"mappings": {
"doc": {
  "properties": {
    "code": {
      "type": "text",
       "analyzer": "custom_analyzer"
      }
       }
  }
 }
}


推荐答案

要执行所需的操作,可能必须更改所使用的令牌生成器。当前,您正在使用空白标记生成器,必须将其替换为 pattern 标记生成器。
因此,您的新映射应如下图所示:

To do what you're looking for you might have to change the tokenizer you're using. Currently you are using whitespace tokenizer which must be replaced with pattern tokenizer. So your new mapping should look like the below one:

PUT products
{
"settings": {
"analysis": {
  "analyzer": {
    "custom_analyzer": {
      "type": "custom",
      "tokenizer": "pattern",
      "char_filter": [
        "html_strip"
      ],
      "filter": [
        "lowercase",
        "asciifolding"
      ]
    }
   }
  }
},
"mappings": {
"doc": {
  "properties": {
    "code": {
      "type": "text",
       "analyzer": "custom_analyzer"
      }
    }
  }
 }
}

因此,将映射更改为strong> VMA1615 将返回 MS-VMA1615-0D

So after changing your mapping a query to VMA1615 will return MS-VMA1615-0D.

此操作可用于标记字符串 MS-VMA1615-0D 转换为 MS, VMA1615和 0D。因此,只要您的查询中有任何一个,它将为您提供结果。

This works as it tokenize the string "MS-VMA1615-0D" into "MS", "VMA1615" & "0D". So, whenever in your query you have any of them it will give you the result.

POST _analyze
{
  "tokenizer": "pattern",
  "text": "MS-VMA1615-0D"
}

将返回:

{
  "tokens": [
    {
      "token": "MS",
      "start_offset": 0,
      "end_offset": 2,
      "type": "word",
      "position": 0
    },
    {
      "token": "VMA1615",
      "start_offset": 3,
      "end_offset": 10,
      "type": "word",
      "position": 1
    },
    {
      "token": "0D",
      "start_offset": 11,
      "end_offset": 13,
      "type": "word",
      "position": 2
    }
  ]
}

根据您的评论:


这不是Elasticsearch的工作原理。 Elasticsearch将术语和
相应的文档存储在倒排索引数据结构中,默认情况下
由全文搜索产生的术语基于
空格,即文本 Hi there我是技术专家将
拆分为[ Hi, there, I, am, a, technocrat]。因此,这意味着
所存储的术语取决于其标记方式。在查询了
索引之后,在上例中,如果我查询
technocrat,我将得到结果,因为反向索引具有与我的文档相关联的
术语。因此,在您的情况下, VMA不存储为术语。

It is not how elasticsearch works. Elasticsearch stores the terms and their corresponding documents in an inverted index data structure and by default the terms produced by a full text search is based on white-spaces, i.e. a text "Hi there I am a technocrat" would split up as ["Hi", "there", "I", "am", "a", "technocrat"]. So this implies that the terms which gets stored depends on how it is tokenized. After indexing when you query let's say in the above example if I query for "technocrat", I will get the result as the inverted index has that term associated with my document. So in your case "VMA" is not stored as a term.

为此,请使用以下映射:

To do that use the below mapping:

PUT products
{
"settings": {
"analysis": {
  "analyzer": {
    "custom_analyzer": {
      "type": "custom",
      "tokenizer": "my_pattern_tokenizer",
      "char_filter": [
        "html_strip"
      ],
      "filter": [
        "lowercase",
        "asciifolding"
      ]
    }
   },
   "tokenizer": {
     "my_pattern_tokenizer": {
          "type": "pattern",
          "pattern": "-|\\d"
        }
   }
  }
},
"mappings": {
"doc": {
  "properties": {
    "code": {
      "type": "text",
       "analyzer": "custom_analyzer"
      }
    }
  }
 }
}

要检查:

POST products/_analyze
{
  "tokenizer": "my_pattern_tokenizer",
  "text": "MS-VMA1615-0D"
}

将产生:

{
  "tokens": [
    {
      "token": "MS",
      "start_offset": 0,
      "end_offset": 2,
      "type": "word",
      "position": 0
    },
    {
      "token": "VMA",
      "start_offset": 3,
      "end_offset": 6,
      "type": "word",
      "position": 1
    },
    {
      "token": "D",
      "start_offset": 12,
      "end_offset": 13,
      "type": "word",
      "position": 2
    }
  ]
}

这篇关于ElasticSearch-输入查询中不带(*)的JavaApi搜索不会发生的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆