ElasticSearch RegExp过滤器正则破折号 [英] ElasticSearch RegExp Filter regex dash

查看:326
本文介绍了ElasticSearch RegExp过滤器正则破折号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的ElasticSearch v1.2.1中有一些文档,例如:

I have a few documents in my ElasticSearch v1.2.1 like:

{
  "tempSkipAfterSave": "false",
  "variation": null,
  "images": null,
  "name": "Dolce & Gabbana Short Sleeve Coat",
  "sku": "MD01575254-40-WHITE",
  "user_id": "123foo",
  "creation_date": null,
  "changed": 1
}

其中sku可以是如下形式的变体:MD01575254-40-BlUEMD01575254-38-WHITE

where sku can be a variation such as : MD01575254-40-BlUE, MD01575254-38-WHITE

我可以使用弹性搜索查询来解决此问题:

I can get my elastic search query to work with this:

{
  "size": 1000,
  "from": 0,
  "filter": {
    "and": [
      {
        "regexp": {
          "sku": "md01575254.*"
        }
      },
      {
        "term": {
          "user_id": "123foo"
        }
      },
      {
        "missing": {
          "field": "project_id"
        }
      }
    ]
  },
  "query": {
    "match_all": {}
  }
}    

我得到了sku的所有变体:MD01575254*

I got all the variations back of sku: MD01575254*

但是,破折号'-'确实把我搞砸了

However, the dash '-' is really screwing me up

当我将正则表达式更改为:

when I change the regexp to:

"regexp": {
  "sku": "md01575254-40.*"
}

我无法获得任何结果.我也尝试过

I can't get any results back. I've also tried

  • "sku":"md01575254-40.*"
  • "sku":"md01575254 \ -40.*"
  • "sku":"md01575254-40-.*"
  • ...

似乎无法使其正常工作?我在这里没错吗?

Just can't seem to make it work ? What am I don't wrong here?

推荐答案

问题:

这是因为默认分析器通常在-处标记化,因此您的字段最有可能像这样保存:

This is because the default analyzer usually tokenizes at -, so your field is most likey saved like:

  • MD01575254
  • 40
  • BlUE
  • MD01575254
  • 40
  • BlUE

解决方案:

您可以将映射更新为具有sku.raw字段,该字段在建立索引时不会被分析.这将要求您删除并重新编制索引.

You can update your mapping to have a sku.raw field that would not be analyzed when indexed. This will require you to delete and re-index.

{
  "<type>" : {
    "properties" : {
      ...,
      "sku" : {
        "type": "string",
        "fields" : {
          "raw" : {"type" : "string", "index" : "not_analyzed"}
        }
      }
    }
  }
}

然后,您可以查询未分析的新字段:

Then you can query this new field which is not analyzed:

{
  "query" : {
    "regexp" : {
      "sku.raw": "md01575254-40.*"
    }
  }
}


HTTP端点:

删除当前映射和数据的API是:

The API to delete your current mapping and data is:

DELETE http://localhost:9200/<index>/<type>

使用原始SKU添加新映射的API是:

The API to add your new mapping, with the raw SKU is:

PUT http://localhost:9200/<index>/<type>/_mapping


链接:

  • multiple fields in mapping
  • analyzers

这篇关于ElasticSearch RegExp过滤器正则破折号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆