Mongo-connector是否支持在插入Elasticsearch之前添加字段? [英] Does Mongo-connector supports adding fields before inserting to Elasticsearch?

查看:520
本文介绍了Mongo-connector是否支持在插入Elasticsearch之前添加字段?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在mongoDB中有很多的工具。 Mongo连接器将这些数据插入弹性搜索。有没有办法,在插入到ES之前,我们可以添加额外的字段到文档,然后插入弹性搜索? mongo连接器有没有办法做上述?


更新



根据您的更新3 ,我创建了一些这样的映射是正确的?

  PUT my_index2 
{
mappings:{
my_type2:{
transform:{
script:{
inline:if(ctx._source.geopoint.alt)ctx._source.geopoint.remove('alt'),
lang:groovy
}
},
properties:{
geopoint:{
type:geo_point
}
}
}
}
}

错误



当我尝试插入映射时,我会收到什么错误

  {
error:{
root_cause:[
{
type:script_parse_exception,
reason:值必须为String类型: [script]
}
],
type:mapper_parsing_exception,
reason:无法解析映射[my_type2]:值必须为String类型:[script],
causes_by:{
type:script_parse_exception,
reason:值必须是String类型:[script]
}
},
status:400
}

更新2



现在映射正在插入并获得确认为真。但是当尝试插入json数据,如下面的抛出错误。

  PUT my_index2 / my_type2 / 1 
{
geopoint:{
lon:48.845877,
lat:8.821861,
alt:0.0
}
}

更新错误

  {
error:{
root_cause:[
{
type:mapper_parsing_exception,
原因:无法解析
}
],
type:mapper_parsing_exception,
reason:无法解析,
cause_by:{
type:illegal_argument_exception,
reason:无法执行脚本,
causes_by:{
type: script_exception,
reason:[inline],[mapping]和lang [groovy]的脚本被禁用
}
}
},
状态:400
}

错误1 FOR UPDATE 2



添加script.inline:true后,尝试插入数据,但遇到以下错误。

  {
error:{
root_cause:[
{
type:parse_exception
reason:字段必须是[lat],[lon]或[geohash]
}
],
type:mapper_parsing_exception,
reason:无法解析,
causes_by:{
type:parse_exception,
reason:field必须是[lat] [lon]或[geohash]
}
},
状态:400
}


解决方案

mongo-connector旨在将Mongo数据库与另一个目标系统(如ES,Solr或另一个Mongo DB)同步。同步意味着1:1复制,所以没有办法我知道在复制期间丰富文档的mongo连接器(也不是它的意图)。



然而在ES 5中,我们很快就可以使用摄取节点我们可以在其中定义处理流水线他们的目标是在文档索引之前丰富文档。



更新



可能有一种方法是修改 formatters.py 文件。



transf orm_value 我将添加一个案例来处理 Geopoint

  if isinstance(value,dict):
return self.format_document(value)
elif isinstance(value,list):
return [self.transform_value )for v in value]

#handle Geopoint class
elif isinstance(value,Geopoint):
return self.format.document({'lat':value ['lat '],'lon':value ['lon']})

...

更新2



让我们尝试另一种方法,修改 transform_element function (on第104行):

  def transform_element(self,key,value):
try:
#add这两行
如果key =='GeoPoint':
value = {'lat':value [ 'lat'],'lon':value ['lon']}
#不要修改下面的初始代码
new_value = self.transform_value(value)
yield key,new_value
除了ValueError作为e:
LOG.warn(无效的值:%s为%s
%(key,str(e)))

更新3



另一件可以尝试的是添加 转换 。我之前没有提到的原因是它在ES 2.0中已被弃用,但是在ES 5.0中,您将获取节点,并且您可以在摄取时使用 删除处理器



您可以定义如下的映射:

  PUT my_index2 
{
mappings:{
my_type2:{
transform:{
script:ctx._source.geopoint.remove('alt '); ctx._source.geopoint.remove('valid')
},
properties:{
geopoint:{
type:geo_point
}
}
}
}
}

注意:确保启用动态脚本,通过将 script.inline:true 添加到 elasticsearch.yml 并重新启动您的ES节点。



将要发生的是, code> alt 字段仍然可以在存储的 _source 中显示,但不会被编入索引,因此不会发生错误。



使用ES 5,您只需创建一个删除处理器的管道,如下所示:

  PUT _ingest / pipeline / geo-pipeline 
{
description:remove unsupported altitude field,
processor:[
{
remove:{
field:geopoint.alt
}
}
]
}


  • I have many docements in mongoDB. Mongo-connector inserts those data to elasticsearch. Is there a way, before inserting in to ES where we can add extra field to the document and then insert into elasticsearch? Is there any way in mongo-connector to do the above?

UPDATE

based on your UPDATE 3 i created mappings some thing like this is it correct?

PUT my_index2
{
 "mappings":{
  "my_type2": {
  "transform": {
  "script": {
    "inline": "if (ctx._source.geopoint.alt) ctx._source.geopoint.remove('alt')",
    "lang": "groovy"
  }
},
"properties": {
  "geopoint": {
    "type": "geo_point"
  }
 }
}
}
}

ERROR

This what the error i keep getting when i tried to insert your mapping

{
   "error": {
  "root_cause": [
     {
        "type": "script_parse_exception",
        "reason": "Value must be of type String: [script]"
     }
  ],
  "type": "mapper_parsing_exception",
  "reason": "Failed to parse mapping [my_type2]: Value must be of type String: [script]",
  "caused_by": {
     "type": "script_parse_exception",
     "reason": "Value must be of type String: [script]"
  }
   },
   "status": 400
}

UPDATE 2

Now the mapping is getting inserted and getting the acknowledge as true. But when try to insert the json data like below its throwing error.

PUT my_index2/my_type2/1
{
 "geopoint": {
        "lon": 48.845877,
        "lat": 8.821861,
        "alt": 0.0
        }
}         

ERROR FOR UPDATE2

{
   "error": {
  "root_cause": [
     {
        "type": "mapper_parsing_exception",
        "reason": "failed to parse"
     }
  ],
  "type": "mapper_parsing_exception",
  "reason": "failed to parse",
  "caused_by": {
     "type": "illegal_argument_exception",
     "reason": "failed to execute script",
     "caused_by": {
        "type": "script_exception",
        "reason": "scripts of type [inline], operation [mapping] and lang [groovy] are disabled"
     }
  }
  },
  "status": 400
}

ERROR 1 FOR UPDATE 2

After adding script.inline:true, tried to insert the data but getting following error.

{
   "error": {
  "root_cause": [
     {
        "type": "parse_exception",
        "reason": "field must be either [lat], [lon] or [geohash]"
     }
  ],
  "type": "mapper_parsing_exception",
  "reason": "failed to parse",
  "caused_by": {
     "type": "parse_exception",
     "reason": "field must be either [lat], [lon] or [geohash]"
  }
   },
   "status": 400
}

解决方案

mongo-connector aims at synchronizing a Mongo database with another target system, such as ES, Solr or another Mongo DB. Synchronizing means 1:1 replication, so there's no way that I know of for mongo-connector to enrich documents during the replication (and it's not its intent either).

However, in ES 5 we'll soon be able to use ingest nodes in which we'll be able to define processing pipelines whose goal is to enrich documents before they get indexed.

UPDATE

There's probably a way by modifying the formatters.py file.

In transform_value I would add a case to handle Geopoint:

    if isinstance(value, dict):
        return self.format_document(value)
    elif isinstance(value, list):
        return [self.transform_value(v) for v in value]

    # handle Geopoint class
    elif isinstance(value, Geopoint):
        return self.format.document({'lat': value['lat'], 'lon': value['lon']})

    ...

UPDATE 2

Let's try another approach by modifying the transform_element function (on line 104):

def transform_element(self, key, value):
    try:
        # add these next two lines
        if key == 'GeoPoint':
            value = {'lat': value['lat'], 'lon': value['lon']}
        # do not modify the initial code below
        new_value = self.transform_value(value)
        yield key, new_value
    except ValueError as e:
        LOG.warn("Invalid value for key: %s as %s"
                 % (key, str(e)))

UPDATE 3

Another thing you might try is to add a transform. The reason I've not mentioned it before is that it was deprecated in ES 2.0, but in ES 5.0 you'll have ingest nodes and you'll be able to take care of it at ingest time using a remove processor

You can define your mapping like this:

PUT my_index2
{
  "mappings": {
    "my_type2": {
      "transform": {
        "script": "ctx._source.geopoint.remove('alt'); ctx._source.geopoint.remove('valid')"
      },
      "properties": {
        "geopoint": {
          "type": "geo_point"
        }
      }
    }
  }
}

Note: make sure enable dynamic scripting, by adding script.inline: true to elasticsearch.yml and restart your ES node.

What is going to happen is that the alt field will still be visible in the stored _source but it will not be indexed, and hence, no error should occur.

With ES 5, you'd simply create a pipeline with a remove processor, like this:

PUT _ingest/pipeline/geo-pipeline
{
  "description" : "remove unsupported altitude field",
  "processors" : [
    {
      "remove" : {
        "field": "geopoint.alt"
      }
    }
  ]
}

这篇关于Mongo-connector是否支持在插入Elasticsearch之前添加字段?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆