弹性搜索:根据类型对不同的字段进行排序 [英] Elasticsearch: Sort on different fields depending on type
问题描述
Event
和 City
),我正在尝试对它们进行排序一起到日期但是,每个类型的日期字段名称不同:事件
该值位于 updated_at
字段,对于 City
,日期位于其 update_at
字段中> city_events 嵌套对象数组(注意过滤 region_id
)。 ve尝试指定排序数组中的每个字段,如下所示:
sort:[
{
$ _ $$$$$$$$$$$$$$ b $ bcity_events.region_id:1
}
}
}
},
{
updated_at:desc
}
]
但不幸的是,这并不混合两种类型。相反,它首先通过嵌套的 city_events.updated_at
字段排序所有城市
,然后追加所有最后按
。如何将这两者混合在一起? updated_at
字段排列事件
作为一种替代解决方案,我只是通过嵌套的 city_events.updated_at
字段并指定missing:updated_at
,但是尽管如此,仍然抛出了一个number_format_exception
两个字段的格式相同:
{
error:{
root_cause [
{
type:number_format_exception,
reason:输入字符串:\updated_at\
}
]
type:search_phase_execution_exception,
reason:all shards failed,
phase:query_fetch,
grouping:true,
failed_shards:[
{
shard:0,
index:events_1461095196252,
node:sYQstSw_SN62ojmXgGjPlg,
reason:{
type:number_format_exception,
reason:对于输入字符串:\updat ed_at\
}
}
]
},
状态:400
}
更新1:根据 Andrei Stefan的答案下面我试过开发一个groovy脚本,每个 City $ c $循环
city_events
c>文件选择具有匹配的 region_id
的文档,然后返回 city_event
的 updated_at
用于评分的值,但在脚本中访问嵌套字段时遇到问题: https://stackoverflow.com/questions/36781476/elasticsearch-access-fields-inside-array-of-nested-objects-in-a-groovy-script
根据脚本
进行排序,您需要嵌套
字段有我nclude_in_parent:true
可在脚本中访问:
city_events:{
type:nested,
include_in_parent:true,
properties:{
updated_at:{
type:date
}
}
}
排序部分:
sort:{
_script:{
type:number,
script:{
inline:if(doc ['_ type']。value =='Event')return doc ['updated_at'] date.getMillis(); else if(doc ['_ type']。value =='City')return doc ['city_events.updated_at']。date.getMillis(),
lang:groovy
} ,
order:desc
}
}
LATER EDIT
即使我添加了 city_events.region_id == 1
条件到Groovy脚本,这不会感觉到弹性搜索,这将是纯Groovy编程,而不是Elasticsearch的力量。
我尝试过其他方法(全部在ES 2.3.1中):
-
copy_to
从updated_at
字段到嵌套
字段内的事件
,以便在所有类型上执行常规嵌套
排序。这不起作用。 - 即使
copy_to
将会工作,Elasticsearch不会匹配term:{city_events.region_id:1}
(as code> region_id 不存在事件
)从排序
部分在事件
类型和对于这些值将使用-9223372036854776000
而不是实际的日期(该值来自我执行的测试)。 - 使用
嵌套
字段在事件
以及索引时,将updated_at
在这个嵌套字段中。由于与上述尝试#2相同的原因,这不会起作用:在Event
中必须有一个region_id
以便嵌套
过滤器从sort
部分将适用两种类型。
我建议,作为一种正确的处理方式,是重新考虑一下数据结构,以便排序部分(至少)将遵循Elasticsearch的做事方式。您的类型称为城市
和事件
和 City
有一个列表(嵌套) city_events
。您不能在城市
中包含事件
,并复制每个城市的活动详情?这不一定是一个规范化的RDB数据结构。相反,ES对未归一化的数据更为快乐。
为了完整,但< t推荐这个:
sort:{
_script:{
type:number,
script:{
inline:if(doc ['_ type']。value =='Event')return doc ['updated_at']。 date.getMillis(); else if(doc ['_ type']。value =='City'){for(nestedObj in _source.city_events){if(nestedObj.region_id == 1)return nestedObj.updated_at.toLong() ;}},
lang:groovy
},
order:desc
}
}
请注意,我没有在上面的Groovy脚本中进行所有正确的检查(检查是否有实际的嵌套对象例如文件)。
I have two types in my index (Event
and City
) and I'm trying to sort them all together by date. However the date's field name is different for each type:
for the Event
the value is in the updated_at
field and for City
the date is in the update_at
field in one of the nested objects of its city_events
nested object array (note the filtering by region_id
).
I've tried specifying each field in the sort array like this:
"sort": [
{
"city_events.updated_at": {
"order": "desc",
"nested_path": "city_events",
"nested_filter": {
"term": {
"city_events.region_id": 1
}
}
}
},
{
"updated_at": "desc"
}
]
But unfortunately this doesn't mix the two types together. Instead, it first sorts all Cities
by their nested city_events.updated_at
field and then appends all Events
at the bottom sorted by their updated_at
field. How do I mix and sort the two together?
As an alternative solution I've tried sorting only by the nested city_events.updated_at
field and specifying "missing": "updated_at"
, however that threw a "number_format_exception"
error despite both fields being in the same format:
{
"error": {
"root_cause": [
{
"type": "number_format_exception",
"reason": "For input string: \"updated_at\""
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query_fetch",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "events_1461095196252",
"node": "sYQstSw_SN62ojmXgGjPlg",
"reason": {
"type": "number_format_exception",
"reason": "For input string: \"updated_at\""
}
}
]
},
"status": 400
}
UPDATE 1: Based on the answer by Andrei Stefan below I've tried developing a groovy script that looped over city_events
for each City
document selecting the one with a matching region_id
and then returning that city_event
's updated_at
value for scoring, but had problems accessing nested fields within the script: https://stackoverflow.com/questions/36781476/elasticsearch-access-fields-inside-array-of-nested-objects-in-a-groovy-script
Try script
based sorting, and you would need your nested
field to have include_in_parent: true
to be accessible in the script:
"city_events": {
"type": "nested",
"include_in_parent": true,
"properties": {
"updated_at": {
"type": "date"
}
}
}
And the sorting part:
"sort": {
"_script": {
"type": "number",
"script": {
"inline": "if (doc['_type'].value=='Event') return doc['updated_at'].date.getMillis(); else if (doc['_type'].value=='City') return doc['city_events.updated_at'].date.getMillis()",
"lang": "groovy"
},
"order": "desc"
}
}
LATER EDIT
Even if I add the city_events.region_id==1
condition to the Groovy script, that will not feel Elasticsearch, this will be pure Groovy programming and not the power of Elasticsearch.
I've tried other approaches (all in ES 2.3.1):
copy_to
from the regularupdated_at
field to anested
field insideEvent
, so that a regularnested
sorting is performed over all types. This didn't work.- even if
copy_to
would have worked, Elasticsearch wouldn't have matched"term": {"city_events.region_id": 1}
(asregion_id
doesn't exist inEvent
) from thesort
part in theEvent
type and for those values would have used-9223372036854776000
instead of the actual date (that values comes from tests I performed). - use a
nested
field inEvent
as well and at indexing time, put thatupdated_at
in this nested field. This will not work for the same reason as the attempt #2 above: there has to be aregion_id
inEvent
as well so that thenested
filter from thesort
part will apply for both types.
What I would suggest, as a proper way of dealing with this, is to re-think a bit the data structure so that the sorting part (at least) will follow the Elasticsearch way of doing things. Your types are called City
and Event
and inside City
you have a list of (nested) city_events
. Can't you include Event
in the City
and duplicate the events' details in each city? This doesn't have to be a normalized, RDB data structure. On the contrary, ES is happier with non-normalized data.
For the completeness sake but I don't recommend this:
"sort": {
"_script": {
"type": "number",
"script": {
"inline": "if (doc['_type'].value=='Event') return doc['updated_at'].date.getMillis(); else if (doc['_type'].value=='City') {for(nestedObj in _source.city_events) {if(nestedObj.region_id==1) return nestedObj.updated_at.toLong();}}",
"lang": "groovy"
},
"order": "desc"
}
}
Note that I haven't done all the proper checks in the Groovy script above (checking if there are actually nested objects in the document for example).
这篇关于弹性搜索:根据类型对不同的字段进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!