ElasticSearch返回的搜索结果中包含未映射的字段 [英] fields not in mapping are included in the search results returned by ElasticSearch
问题描述
mapping:_source => {:excludes => ['attachment_original']} do
索引:id,:type => 'integer'
索引:folder_id,:type => 'integer'
索引:attachment_file_name
索引:attachment_updated_at,:type => 'date'
索引:attachment_original,:type => 'attachment'
end
我仍然可以看到搜索结果中包含的附件内容我运行以下curl命令:
curl -X POSThttp:// localhost:9200 / user_files / user_file / _search? pretty = true-d'{
query:{
query_string:{
query:rspec
}
}
$'
我已经在这个线程:
但我刚刚注意到,不仅附件包含在搜索结果中,而且所有其他字段(包括未映射的字段)也包含在这里:
{
taken:20,
timed_out:false,
_shards {
total:5,
success:5,
faile d:0
},
hits:{
total:1,
max_score:0.025427073,
hits:[
{
_index:user_files,
_type:user_file,
_id:5,
_score:0.025427073,
_source:{
user_file:{
id:5,
folder_id:1,
updated_at:2012-08- 16T11:32:41Z,
attachment_file_size:179895,
attachment_updated_at:2012-08-16T11:32:41Z,
attachment_file_name:hw4.pdf ,
attachment_content_type:application / pdf,
created_at:2012-08-16T11:32:41Z,
attachment_original:JVBERi0xLjQKJeLjz9MKNyA
}
}
}
]
}
}
attachment_file_size
和 attachment_content_type
在映射中未定义,但是retu在搜索结果中显示:
{
id:5,
folder_id:1 ,
updated_at:2012-08-16T11:32:41Z,
attachment_file_size:179895,< ----------------- ----
attachment_updated_at:2012-08-16T11:32:41Z,
attachment_file_name:hw4.pdf,< ---------- --------
attachment_content_type:application / pdf,
created_at:2012-08-16T11:32:41Z,
attachment_original JVBERi0xLjQKJeLjz9MKNyA
}
这是我的完整实现:
include Tire :: Model :: Search
include Tire :: Model :: Callbacks
def self.search(folder ,params)
tire.search()do
query {string params [:query],default_operator:AND}如果params [:query] .present?
#filter:term,folder_id:folder.id
#highlight:attachment_original,:options => {:tag => < em>}
raise to_curl
end
end
映射:_source => {:excludes => ['attachment_original']} do
索引:id,:type => 'integer'
索引:folder_id,:type => 'integer'
索引:attachment_file_name
索引:attachment_updated_at,:type => 'date'
索引:attachment_original,:type => 'attachment'
end
def to_indexed_json
to_json(:methods => [:attachment_original])
end
def attachment_original
如果attachment_file_name.present?
path_to_original = attachment.path
Base64.encode64(open(path_to_original){| f | f.read})
end
end
有人可以帮我弄清楚为什么所有的字段都包含在 _source
?
编辑:这是运行 localhost:9200 / user_files / _mapping
{
user_files:{
user_file:{
_source
excludes:[
attachment_original
]
},
properties:{
attachment_content_type:{
类型:string
},
attachment_file_name:{
type:string
},
attachment_file_size:{
type:long
},
attachment_original:{
type:attachment,
path:full,
字段: {
attachment_original:{
type:string
},
author:{
type:string
title:{
type:string
},
name:{
type:string
},
date:{
type:date,
format:dateOptionalTime
},
keywords :{
type:string
},
content_type:{
type:string
}
}
},
attachment_updated_at:{
type:date,
format:dateOptionalTime
},
created_at :{
type:date,
format:dateOptionalTime
},
folder_id:{
type:inte g $
},
id:{
type:integer
},
updated_at:{
type :date,
format:dateOptionalTime
}
}
}
}
}
正如你所看到的,由于某些原因,所有的字段都包含在映射中!
在 to_indexed_json
中,您将包含 attachment_original
方法,因此被送到弹性搜索这也是为什么所有其他属性都包含在映射中,因此是源的。
请参阅 ElasticSearch&轮胎:使用Mapping和to_indexed_json 问题了解有关该主题的更多信息。
似乎Tire确实将适当的映射JSON发送到弹性搜索 - 我的建议是使用 Tire.configure {logger STDERR,level:debug}
来检查发生了什么,并且trz在原始级别上确定问题。
I want to index pdf attachment using Tire gem as client for ElasticSearch. In my mapping, I exclude the attachment field from _source, so that the attachment is not stored in the index and not returned in the search results:
mapping :_source => { :excludes => ['attachment_original'] } do
indexes :id, :type => 'integer'
indexes :folder_id, :type => 'integer'
indexes :attachment_file_name
indexes :attachment_updated_at, :type => 'date'
indexes :attachment_original, :type => 'attachment'
end
I can still see the attachment content included in the search results when I run the following curl command:
curl -X POST "http://localhost:9200/user_files/user_file/_search?pretty=true" -d '{
"query": {
"query_string": {
"query": "rspec"
}
}
}'
I have posted my question in this thread:
But I have just noticed that not only the attachment is included in the search results, but all other fields, including the ones that are not mapped, are also included as you can see here:
{
"took": 20,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.025427073,
"hits": [
{
"_index": "user_files",
"_type": "user_file",
"_id": "5",
"_score": 0.025427073,
"_source": {
"user_file": {
"id": 5,
"folder_id": 1,
"updated_at": "2012-08-16T11:32:41Z",
"attachment_file_size": 179895,
"attachment_updated_at": "2012-08-16T11:32:41Z",
"attachment_file_name": "hw4.pdf",
"attachment_content_type": "application/pdf",
"created_at": "2012-08-16T11:32:41Z",
"attachment_original": "JVBERi0xLjQKJeLjz9MKNyA"
}
}
}
]
}
}
attachment_file_size
and attachment_content_type
are not defined in the mapping, but are returned in the search results:
{
"id": 5,
"folder_id": 1,
"updated_at": "2012-08-16T11:32:41Z",
"attachment_file_size": 179895, <---------------------
"attachment_updated_at": "2012-08-16T11:32:41Z",
"attachment_file_name": "hw4.pdf", <------------------
"attachment_content_type": "application/pdf",
"created_at": "2012-08-16T11:32:41Z",
"attachment_original": "JVBERi0xLjQKJeLjz9MKNyA"
}
Here's my full implementation:
include Tire::Model::Search
include Tire::Model::Callbacks
def self.search(folder, params)
tire.search() do
query { string params[:query], default_operator: "AND"} if params[:query].present?
#filter :term, folder_id: folder.id
#highlight :attachment_original, :options => {:tag => "<em>"}
raise to_curl
end
end
mapping :_source => { :excludes => ['attachment_original'] } do
indexes :id, :type => 'integer'
indexes :folder_id, :type => 'integer'
indexes :attachment_file_name
indexes :attachment_updated_at, :type => 'date'
indexes :attachment_original, :type => 'attachment'
end
def to_indexed_json
to_json(:methods => [:attachment_original])
end
def attachment_original
if attachment_file_name.present?
path_to_original = attachment.path
Base64.encode64(open(path_to_original) { |f| f.read })
end
end
Could somebody help me figure out why all the fields are included in the _source
?
Edit: This is the output of running localhost:9200/user_files/_mapping
{
"user_files": {
"user_file": {
"_source": {
"excludes": [
"attachment_original"
]
},
"properties": {
"attachment_content_type": {
"type": "string"
},
"attachment_file_name": {
"type": "string"
},
"attachment_file_size": {
"type": "long"
},
"attachment_original": {
"type": "attachment",
"path": "full",
"fields": {
"attachment_original": {
"type": "string"
},
"author": {
"type": "string"
},
"title": {
"type": "string"
},
"name": {
"type": "string"
},
"date": {
"type": "date",
"format": "dateOptionalTime"
},
"keywords": {
"type": "string"
},
"content_type": {
"type": "string"
}
}
},
"attachment_updated_at": {
"type": "date",
"format": "dateOptionalTime"
},
"created_at": {
"type": "date",
"format": "dateOptionalTime"
},
"folder_id": {
"type": "integer"
},
"id": {
"type": "integer"
},
"updated_at": {
"type": "date",
"format": "dateOptionalTime"
}
}
}
}
}
As you can see, for some reason all the fields are included in the mapping!
In your to_indexed_json
, you include the attachment_original
method, so it is sent to elasticsearch. That's also the reason why all your other properties are included in the mapping and, consequently, source.
See the ElasticSearch & Tire: Using Mapping and to_indexed_json question for more information on the topic.
It seems that Tire is indeed sending the proper mapping JSON to elasticsearch -- my advice is to use Tire.configure { logger STDERR, level: "debug" }
to inspect what is happening and trz to pinpoint the problem on the raw level.
这篇关于ElasticSearch返回的搜索结果中包含未映射的字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!