ElasticSearch返回的搜索结果中包含未映射的字段 [英] fields not in mapping are included in the search results returned by ElasticSearch

查看:193
本文介绍了ElasticSearch返回的搜索结果中包含未映射的字段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用Tire gem作为ElasticSearch的客户端索引pdf附件。在我的映射中,我从_source中排除了附件字段,因此附件不存储在索引中,不会在搜索结果

中返回:

  mapping:_source => {:excludes => ['attachment_original']} do 
索引:id,:type => 'integer'
索引:folder_id,:type => 'integer'
索引:attachment_file_name
索引:attachment_updated_at,:type => 'date'
索引:attachment_original,:type => 'attachment'
end

我仍然可以看到搜索结果中包含的附件​​内容我运行以下curl命令:

  curl -X POSThttp:// localhost:9200 / user_files / user_file / _search? pretty = true-d'{
query:{
query_string:{
query:rspec
}
}
$'

我已经在这个线程



但我刚刚注意到,不仅附件包含在搜索结果中,而且所有其他字段(包括未映射的字段)也包含在这里:

  {
taken:20,
timed_out:false,
_shards {
total:5,
success:5,
faile d:0
},
hits:{
total:1,
max_score:0.025427073,
hits:[
{
_index:user_files,
_type:user_file,
_id:5,
_score:0.025427073,
_source:{
user_file:{
id:5,
folder_id:1,
updated_at:2012-08- 16T11:32:41Z,
attachment_file_size:179895,
attachment_updated_at:2012-08-16T11:32:41Z,
attachment_file_name:hw4.pdf ,
attachment_content_type:application / pdf,
created_at:2012-08-16T11:32:41Z,
attachment_original:JVBERi0xLjQKJeLjz9MKNyA
}
}
}
]
}
}

attachment_file_size attachment_content_type 在映射中未定义,但是retu在搜索结果中显示:

  {
id:5,
folder_id:1 ,
updated_at:2012-08-16T11:32:41Z,
attachment_file_size:179895,< ----------------- ----
attachment_updated_at:2012-08-16T11:32:41Z,
attachment_file_name:hw4.pdf,< ---------- --------
attachment_content_type:application / pdf,
created_at:2012-08-16T11:32:41Z,
attachment_original JVBERi0xLjQKJeLjz9MKNyA
}

这是我的完整实现:​​

  include Tire :: Model :: Search 
include Tire :: Model :: Callbacks

def self.search(folder ,params)
tire.search()do
query {string params [:query],default_operator:AND}如果params [:query] .present?
#filter:term,folder_id:folder.id
#highlight:attachment_original,:options => {:tag => < em>}
raise to_curl
end
end

映射:_source => {:excludes => ['attachment_original']} do
索引:id,:type => 'integer'
索引:folder_id,:type => 'integer'
索引:attachment_file_name
索引:attachment_updated_at,:type => 'date'
索引:attachment_original,:type => 'attachment'
end

def to_indexed_json
to_json(:methods => [:attachment_original])
end

def attachment_original
如果attachment_file_name.present?
path_to_original = attachment.path
Base64.encode64(open(path_to_original){| f | f.read})
end
end

有人可以帮我弄清楚为什么所有的字段都包含在 _source

编辑:这是运行 localhost:9200 / user_files / _mapping

  {
user_files:{
user_file:{
_source
excludes:[
attachment_original
]
},
properties:{
attachment_content_type:{
类型:string
},
attachment_file_name:{
type:string
},
attachment_file_size:{
type:long
},
attachment_original:{
type:attachment,
path:full,
字段: {
attachment_original:{
type:string
},
author:{
type:string

title:{
type:string
},
name:{
type:string
},
date:{
type:date,
format:dateOptionalTime
},
keywords :{
type:string
},
content_type:{
type:string
}
}
},
attachment_updated_at:{
type:date,
format:dateOptionalTime
},
created_at :{
type:date,
format:dateOptionalTime
},
folder_id:{
type:inte g $
},
id:{
type:integer
},
updated_at:{
type :date,
format:dateOptionalTime
}
}
}
}
}

正如你所看到的,由于某些原因,所有的字段都包含在映射中!

解决方案

to_indexed_json 中,您将包含 attachment_original 方法,因此被送到弹性搜索这也是为什么所有其他属性都包含在映射中,因此是源的。



请参阅 ElasticSearch&轮胎:使用Mapping和to_indexed_json 问题了解有关该主题的更多信息。



似乎Tire确实将适当的映射JSON发送到弹性搜索 - 我的建议是使用 Tire.configure {logger STDERR,level:debug} 来检查发生了什么,并且trz在原始级别上确定问题。


I want to index pdf attachment using Tire gem as client for ElasticSearch. In my mapping, I exclude the attachment field from _source, so that the attachment is not stored in the index and not returned in the search results:

mapping :_source => { :excludes => ['attachment_original'] } do
  indexes :id, :type => 'integer'
  indexes :folder_id, :type => 'integer'
  indexes :attachment_file_name
  indexes :attachment_updated_at, :type => 'date'
  indexes :attachment_original, :type => 'attachment'
end 

I can still see the attachment content included in the search results when I run the following curl command:

curl -X POST "http://localhost:9200/user_files/user_file/_search?pretty=true" -d '{
  "query": {
    "query_string": {
      "query": "rspec"
    }
  }
}'

I have posted my question in this thread:

But I have just noticed that not only the attachment is included in the search results, but all other fields, including the ones that are not mapped, are also included as you can see here:

{
  "took": 20,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.025427073,
    "hits": [
      {
        "_index": "user_files",
        "_type": "user_file",
        "_id": "5",
        "_score": 0.025427073,
        "_source": {
          "user_file": {
            "id": 5,
            "folder_id": 1,
            "updated_at": "2012-08-16T11:32:41Z",
            "attachment_file_size": 179895,
            "attachment_updated_at": "2012-08-16T11:32:41Z",
            "attachment_file_name": "hw4.pdf",
            "attachment_content_type": "application/pdf",
            "created_at": "2012-08-16T11:32:41Z",
            "attachment_original": "JVBERi0xLjQKJeLjz9MKNyA"
          }
        }
      }
    ]
  }
}

attachment_file_size and attachment_content_type are not defined in the mapping, but are returned in the search results:

{
  "id": 5,
  "folder_id": 1,
  "updated_at": "2012-08-16T11:32:41Z",
  "attachment_file_size": 179895, <---------------------
  "attachment_updated_at": "2012-08-16T11:32:41Z",
  "attachment_file_name": "hw4.pdf", <------------------
  "attachment_content_type": "application/pdf",
  "created_at": "2012-08-16T11:32:41Z",
  "attachment_original": "JVBERi0xLjQKJeLjz9MKNyA"
}

Here's my full implementation:

  include Tire::Model::Search
  include Tire::Model::Callbacks

  def self.search(folder, params)
    tire.search() do
      query { string params[:query], default_operator: "AND"} if params[:query].present?
      #filter :term, folder_id: folder.id
      #highlight :attachment_original, :options => {:tag => "<em>"}
      raise to_curl
    end
  end

  mapping :_source => { :excludes => ['attachment_original'] } do
    indexes :id, :type => 'integer'
    indexes :folder_id, :type => 'integer'
    indexes :attachment_file_name
    indexes :attachment_updated_at, :type => 'date'
    indexes :attachment_original, :type => 'attachment'
  end

  def to_indexed_json
     to_json(:methods => [:attachment_original])
   end

  def attachment_original
    if attachment_file_name.present?
      path_to_original = attachment.path
      Base64.encode64(open(path_to_original) { |f| f.read })
    end    
  end

Could somebody help me figure out why all the fields are included in the _source?

Edit: This is the output of running localhost:9200/user_files/_mapping

{
  "user_files": {
    "user_file": {
      "_source": {
        "excludes": [
          "attachment_original"
        ]
      },
      "properties": {
        "attachment_content_type": {
          "type": "string"
        },
        "attachment_file_name": {
          "type": "string"
        },
        "attachment_file_size": {
          "type": "long"
        },
        "attachment_original": {
          "type": "attachment",
          "path": "full",
          "fields": {
            "attachment_original": {
              "type": "string"
            },
            "author": {
              "type": "string"
            },
            "title": {
              "type": "string"
            },
            "name": {
              "type": "string"
            },
            "date": {
              "type": "date",
              "format": "dateOptionalTime"
            },
            "keywords": {
              "type": "string"
            },
            "content_type": {
              "type": "string"
            }
          }
        },
        "attachment_updated_at": {
          "type": "date",
          "format": "dateOptionalTime"
        },
        "created_at": {
          "type": "date",
          "format": "dateOptionalTime"
        },
        "folder_id": {
          "type": "integer"
        },
        "id": {
          "type": "integer"
        },
        "updated_at": {
          "type": "date",
          "format": "dateOptionalTime"
        }
      }
    }
  }
}

As you can see, for some reason all the fields are included in the mapping!

解决方案

In your to_indexed_json, you include the attachment_original method, so it is sent to elasticsearch. That's also the reason why all your other properties are included in the mapping and, consequently, source.

See the ElasticSearch & Tire: Using Mapping and to_indexed_json question for more information on the topic.

It seems that Tire is indeed sending the proper mapping JSON to elasticsearch -- my advice is to use Tire.configure { logger STDERR, level: "debug" } to inspect what is happening and trz to pinpoint the problem on the raw level.

这篇关于ElasticSearch返回的搜索结果中包含未映射的字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆