如何使用ElasticSearch搜索一部分单词 [英] How to search for a part of a word with ElasticSearch

查看：284 发布时间：2017/8/6 22:25:03 elasticsearch

本文介绍了如何使用ElasticSearch搜索一部分单词的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我最近开始使用ElasticSearch，我似乎不能搜索一部分字词。

I've recently started using ElasticSearch and I can't seem to make it search for a part of a word.

示例：我有三个文档couchdb索引在ElasticSearch中：

Example: I have three documents from my couchdb indexed in ElasticSearch:

{
  "_id" : "1",
  "name" : "John Doeman",
  "function" : "Janitor"
}
{
  "_id" : "2",
  "name" : "Jane Doewoman",
  "function" : "Teacher"
}
{
  "_id" : "3",
  "name" : "Jimmy Jackal",
  "function" : "Student"
}

所以现在，我想搜索所有包含Doe的文档

So now, I want to search for all documents containing "Doe"

curl http://localhost:9200/my_idx/my_type/_search?q=Doe

不返回任何匹配。但是如果我搜索

That doesn't return any hits. But if I search for

curl http://localhost:9200/my_idx/my_type/_search?q=Doeman

它返回一个文件（John Doeman）。

It does return one document (John Doeman).

我已经尝试将不同的分析器和不同的过滤器设置为索引的属性。我也尝试使用完整的查询（例如：

I've tried setting different analyzers and different filters as properties of my index. I've also tried using a full blown query (for example:

{
  "query": {
    "term": {
      "name": "Doe"
    }
  }
}

）
但没有什么似乎有效。

) But nothing seems to work.

我如何使ElasticSearch找到John Doeman和Jane Doewoman当我搜索Doe？

How can I make ElasticSearch find both John Doeman and Jane Doewoman when I search for "Doe" ?

更新

我试图使用像Igor所提出的nGram分类器和过滤器，像这样：

I tried to use the nGram tokenizer and filter, like Igor proposed, like this:

{
  "index": {
    "index": "my_idx",
    "type": "my_type",
    "bulk_size": "100",
    "bulk_timeout": "10ms",
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "tokenizer": "my_ngram_tokenizer",
          "filter": [
            "my_ngram_filter"
          ]
        }
      },
      "filter": {
        "my_ngram_filter": {
          "type": "nGram",
          "min_gram": 1,
          "max_gram": 1
        }
      },
      "tokenizer": {
        "my_ngram_tokenizer": {
          "type": "nGram",
          "min_gram": 1,
          "max_gram": 1
        }
      }
    }
  }
}

我现在遇到的问题是每个查询返回所有文档。
任何指针？使用nGram的ElasticSearch文档不是很好...

The problem I'm having now is that each and every query returns ALL documents. Any pointers? ElasticSearch documentation on using nGram isn't great...

推荐答案

我也使用nGram。我使用标准tokenizer和nGram作为过滤器。这是我的设置：

I'm using nGram, too. I use standard tokenizer and nGram just as a filter. Here is my setup:

{
  "index": {
    "index": "my_idx",
    "type": "my_type",
    "analysis": {
      "index_analyzer": {
        "my_index_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "mynGram"
          ]
        }
      },
      "search_analyzer": {
        "my_search_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "standard",
            "lowercase",
            "mynGram"
          ]
        }
      },
      "filter": {
        "mynGram": {
          "type": "nGram",
          "min_gram": 2,
          "max_gram": 50
        }
      }
    }
  }
}

字部分最多50个字母。根据需要调整max_gram。德语中的单词可以变得很大，所以我把它设置得很高。

Let's you find word parts up to 50 letters. Adjust the max_gram as you need. In german words can get really big, so I set it to a high value.

这篇关于如何使用ElasticSearch搜索一部分单词的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用ElasticSearch搜索一部分单词 [英] How to search for a part of a word with ElasticSearch

问题描述

推荐答案

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

如何使用ElasticSearch搜索一部分单词 [英] How to search for a part of a word with ElasticSearch

问题描述

推荐答案

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭