Elasticsearch:是否有一种方法可以将对象字段的所有(可能是动态的)子字段声明为字符串? [英] Elasticsearch: is there a way to declare for all (possibly dynamic) subfields of an object field as string?

查看:192
本文介绍了Elasticsearch:是否有一种方法可以将对象字段的所有(可能是动态的)子字段声明为字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个具有类似于这个非常简化的映射的doc_type:

  {
test {
property:{
name:{
type:string
},
long_searchable_text:{
键入:string
},
clearances:{
type:object
}
}
}
}

字段清除应该是对象,具有一系列用于过滤目的的字母数字标识符。典型的文件将采用以下格式:

  {
name:麦克白王,
long_searchable_text:生活只是一个行走的阴影,一个糟糕的球员,那个...
间隙:{
glamis:aa2862jsgd,
cawdor 3463463551
}
}

问题是有时索引时,对象字段间隙中的新字段的第一索引内容将是完全数字的,如上述情况。这导致Elasticsearch将此字段的类型推断为 long 。但这是一个意外。该字段可能是另一个文档中的字母数字。当这个字段中包含字母数字值的后一个文档到达时,我得到一个解析异常:

  {error:MapperParsingException [无法解析[clearances.cawdor]];嵌套:NumberFormatException [对于输入字符串:\af654hgss1\;,status:400}%
pre>

我试图用这样定义的动态模板来解决这个问题:

 test:{
properties:{
name:{
type:string
},
long_searchable_text:{
type:string
},
clearances:{
type:object
}
}
},
dynamic_templates:[
{
source_template:{
match:clearanceances。*,
映射:{
type:string,
index:not_analyzed
}
}
}
]
}

但是,如果第一个索引文档有一个可以被解析为整数的 clearance.some_subfield 值,它将被推定为整数,并且在该子字段上具有字母数字值的所有后续文档将无法被索引



我可以列出映射中的所有当前子字段,但是它们很多,我希望他们的数字在将来会增长(触发映射的更新,需要一个完整的reindexation ...)。



有没有办法使这个工作,而不是每次添加一个新的子字段而不诉诸这个完整的reindexation?

解决方案

你几乎在那里。



首先,你的动态映射的路径必须在 ,它必须是 path_match 而不是一个简单的匹配



这是一个可运行的示例: https://www.found.no/play/ gist / df030f005da71827ca96

  export ELASTICSEARCH_ENDPOINT =http:// localhost:9200

#创建索引

curl -XPUT$ ELASTICSEARCH_ENDPOINT / play-d'{
settings:{},
mappings:{
测试:{
dynamic_templates:[
{
clearances_as_string:{
path_match:clearanceances。*,
mapping
type:string,
index:not_analyzed
}
}
}
]
}
}
}'


#索引文件
curl -XPOST$ ELASTICSEARCH_ENDPOINT / _bulk?refresh = true-d'
{index {_index:play,_ type:test}}
{clearances:{glamis:1234,cawdor:5678}}
{index {_index:play,_ type:test}}
{clearances:{glamis:aa2862jsgd,cawdor:some string}}
'

#做搜索

curl -XPOST$ ELASTICSEARCH_ENDPOINT / _search?pretty-d'
{
facets:{
cawdor:{
条款:{
field:clearances.cawdor
}
}
}
}
'


I have a doc_type with a mapping similar to this very simplified one:

{
   "test":{
      "properties":{
         "name":{
            "type":"string"
         },
         "long_searchable_text":{
            "type":"string"
         },
         "clearances":{
            "type":"object"
         }
      }
   }
}

The field clearances should be an object, with a series of alphanumeric identifiers for filtering purposes. A typical document will have this format:

{
    "name": "Lord Macbeth",
    "long_searchable_text": "Life's but a walking shadow, a poor player, that..."
    "clearances": {
        "glamis": "aa2862jsgd",
        "cawdor": "3463463551"
    }
}

The problem is that sometimes during indexing, the first indexed content of a new field inside the object field clearances will be completely numerical, as in the case above. This causes Elasticsearch to infer the type of this field as long. But this is an accident. The field might be alphanumeric in another document. When a latter document containing an alphanumeric value in this field arrive, I get a parsing exception:

{"error":"MapperParsingException[failed to parse [clearances.cawdor]]; nested: NumberFormatException[For input string: \"af654hgss1\"]; ","status":400}% 

I tried to solve this with a dynamic template defined like this:

{
   "test":{
      "properties":{
         "name":{
            "type":"string"
         },
         "long_searchable_text":{
            "type":"string"
         },
         "clearances":{
            "type":"object"
         }
      }
   },
   "dynamic_templates":[
      {
         "source_template":{
            "match":"clearances.*",
            "mapping":{
               "type":"string",
               "index":"not_analyzed"
            }
         }
      }
   ]
}

But it keeps happening that if the first indexed document have a clearance.some_subfield value that can be parsed as an integer, it would be inferred as an integer and all subsequent documents that have alphanumeric values on that subfield will fail to be indexed.

I could list all current subfields in the the mapping, but they are many and I expect their number to grow in the future (triggering an update of the mapping and a need for a full reindexation...).

Is there a way to make this work without resorting to this full reindexation everytime a new subfield is added?

解决方案

You're almost there.

First, your dynamic mapping's path must be on clearances.*, and it must be a path_match and not a plain match.

Here's a runnable example: https://www.found.no/play/gist/df030f005da71827ca96

export ELASTICSEARCH_ENDPOINT="http://localhost:9200"

# Create indexes

curl -XPUT "$ELASTICSEARCH_ENDPOINT/play" -d '{
    "settings": {},
    "mappings": {
        "test": {
            "dynamic_templates": [
                {
                    "clearances_as_string": {
                        "path_match": "clearances.*",
                        "mapping": {
                            "type": "string",
                            "index": "not_analyzed"
                        }
                    }
                }
            ]
        }
    }
}'


# Index documents
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_bulk?refresh=true" -d '
{"index":{"_index":"play","_type":"test"}}
{"clearances":{"glamis":1234,"cawdor":5678}}
{"index":{"_index":"play","_type":"test"}}
{"clearances":{"glamis":"aa2862jsgd","cawdor":"some string"}}
'

# Do searches

curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
    "facets": {
        "cawdor": {
            "terms": {
                "field": "clearances.cawdor"
            }
        }
    }
}
'

这篇关于Elasticsearch:是否有一种方法可以将对象字段的所有(可能是动态的)子字段声明为字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆