弹性搜索:根据类型对不同的字段进行排序 [英] Elasticsearch: Sort on different fields depending on type

查看:357
本文介绍了弹性搜索:根据类型对不同的字段进行排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的索引中有两种类型( Event City ),我正在尝试对它们进行排序一起到日期但是,每个类型的日期字段名称不同:
事件该值位于 updated_at 字段,对于 City ,日期位于其 update_at 字段中> city_events 嵌套对象数组(注意过滤 region_id )。



ve尝试指定排序数组中的每个字段,如下所示:

 sort:[
{
$ _ $$$$$$$$$$$$$$ b $ bcity_events.region_id:1
}
}
}
},
{
updated_at:desc
}
]

但不幸的是,这并不混合两种类型。相反,它首先通过嵌套的 city_events.updated_at 字段排序所有城市,然后追加所有最后按 updated_at 字段排列事件。如何将这两者混合在一起?



作为一种替代解决方案,我只是通过嵌套的 city_events.updated_at 字段并指定missing:updated_at,但是尽管如此,仍然抛出了一个number_format_exception两个字段的格式相同:

  {
error:{
root_cause [
{
type:number_format_exception,
reason:输入字符串:\updated_at\
}
]
type:search_phase_execution_exception,
reason:all shards failed,
phase:query_fetch,
grouping:true,
failed_shards:[
{
shard:0,
index:events_1461095196252,
node:sYQstSw_SN62ojmXgGjPlg,
reason:{
type:number_format_exception,
reason:对于输入字符串:\updat ed_at\
}
}
]
},
状态:400
}

更新1:根据 Andrei Stefan的答案下面我试过开发​​一个groovy脚本,每个 City city_events c>文件选择具有匹配的 region_id 的文档,然后返回 city_event updated_at 用于评分的值,但在脚本中访问嵌套字段时遇到问题: https://stackoverflow.com/questions/36781476/elasticsearch-access-fields-inside-array-of-nested-objects-in-a-groovy-script

解决方案

根据脚本进行排序,您需要嵌套字段有我nclude_in_parent:true 可在脚本中访问:

 city_events:{
type:nested,
include_in_parent:true,
properties:{
updated_at:{
type:date
}
}
}

排序部分:

 sort:{
_script:{
type:number,
script:{
inline:if(doc ['_ type']。value =='Event')return doc ['updated_at'] date.getMillis(); else if(doc ['_ type']。value =='City')return doc ['city_events.updated_at']。date.getMillis(),
lang:groovy
} ,
order:desc
}
}






LATER EDIT



即使我添加了 city_events.region_id == 1 条件到Groovy脚本,这不会感觉到弹性搜索,这将是纯Groovy编程,而不是Elasticsearch的力量。



我尝试过其他方法(全部在ES 2.3.1中):




  • copy_to updated_at 字段到嵌套字段内的事件,以便在所有类型上执行常规嵌套排序。这不起作用。

  • 即使 copy_to 将会工作,Elasticsearch不会匹配term:{city_events.region_id:1} (as code> region_id 不存在事件)从排序部分在事件类型和对于这些值将使用 -9223372036854776000 而不是实际的日期(该值来自我执行的测试)。

  • 使用嵌套字段在事件以及索引时,将 updated_at 在这个嵌套字段中。由于与上述尝试#2相同的原因,这不会起作用:在 Event 中必须有一个 region_id 以便嵌套过滤器从 sort 部分将适用两种类型。



我建议,作为一种正确的处理方式,是重新考虑一下数据结构,以便排序部分(至少)将遵循Elasticsearch的做事方式。您的类型称为城市事件 City 有一个列表(嵌套) city_events 。您不能在城市中包含事件,并复制每个城市的活动详情?这不一定是一个规范化的RDB数据结构。相反,ES对未归一化的数据更为快乐。






为了完整,但< t推荐这个:

 sort:{
_script:{
type:number,
script:{
inline:if(doc ['_ type']。value =='Event')return doc ['updated_at']。 date.getMillis(); else if(doc ['_ type']。value =='City'){for(nestedObj in _source.city_events){if(nestedObj.region_id == 1)return nestedObj.updated_at.toLong() ;}},
lang:groovy
},
order:desc
}
}

请注意,我没有在上面的Groovy脚本中进行所有正确的检查(检查是否有实际的嵌套对象例如文件)。


I have two types in my index (Event and City) and I'm trying to sort them all together by date. However the date's field name is different for each type: for the Event the value is in the updated_at field and for City the date is in the update_at field in one of the nested objects of its city_events nested object array (note the filtering by region_id).

I've tried specifying each field in the sort array like this:

  "sort": [
    {
      "city_events.updated_at": {
        "order": "desc",
        "nested_path": "city_events",
        "nested_filter": {
          "term": {
            "city_events.region_id": 1
          }
        }
      }
    },
    {
      "updated_at": "desc"
    }
  ]

But unfortunately this doesn't mix the two types together. Instead, it first sorts all Cities by their nested city_events.updated_at field and then appends all Events at the bottom sorted by their updated_at field. How do I mix and sort the two together?

As an alternative solution I've tried sorting only by the nested city_events.updated_at field and specifying "missing": "updated_at", however that threw a "number_format_exception" error despite both fields being in the same format:

{
  "error": {
    "root_cause": [
      {
        "type": "number_format_exception",
        "reason": "For input string: \"updated_at\""
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query_fetch",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "events_1461095196252",
        "node": "sYQstSw_SN62ojmXgGjPlg",
        "reason": {
          "type": "number_format_exception",
          "reason": "For input string: \"updated_at\""
        }
      }
    ]
  },
  "status": 400
}

UPDATE 1: Based on the answer by Andrei Stefan below I've tried developing a groovy script that looped over city_events for each City document selecting the one with a matching region_id and then returning that city_event's updated_at value for scoring, but had problems accessing nested fields within the script: https://stackoverflow.com/questions/36781476/elasticsearch-access-fields-inside-array-of-nested-objects-in-a-groovy-script

解决方案

Try script based sorting, and you would need your nested field to have include_in_parent: true to be accessible in the script:

    "city_events": {
      "type": "nested",
      "include_in_parent": true, 
      "properties": {
        "updated_at": {
          "type": "date"
        }
      }
    }

And the sorting part:

  "sort": {
    "_script": {
      "type": "number",
      "script": {
        "inline": "if (doc['_type'].value=='Event') return doc['updated_at'].date.getMillis(); else if (doc['_type'].value=='City') return doc['city_events.updated_at'].date.getMillis()",
        "lang": "groovy"
      },
      "order": "desc"
    }
  }


LATER EDIT

Even if I add the city_events.region_id==1 condition to the Groovy script, that will not feel Elasticsearch, this will be pure Groovy programming and not the power of Elasticsearch.

I've tried other approaches (all in ES 2.3.1):

  • copy_to from the regular updated_at field to a nested field inside Event, so that a regular nested sorting is performed over all types. This didn't work.
  • even if copy_to would have worked, Elasticsearch wouldn't have matched "term": {"city_events.region_id": 1} (as region_id doesn't exist in Event) from the sort part in the Event type and for those values would have used -9223372036854776000 instead of the actual date (that values comes from tests I performed).
  • use a nested field in Event as well and at indexing time, put that updated_at in this nested field. This will not work for the same reason as the attempt #2 above: there has to be a region_id in Event as well so that the nested filter from the sort part will apply for both types.

What I would suggest, as a proper way of dealing with this, is to re-think a bit the data structure so that the sorting part (at least) will follow the Elasticsearch way of doing things. Your types are called City and Event and inside City you have a list of (nested) city_events. Can't you include Event in the City and duplicate the events' details in each city? This doesn't have to be a normalized, RDB data structure. On the contrary, ES is happier with non-normalized data.


For the completeness sake but I don't recommend this:

  "sort": {
    "_script": {
      "type": "number",
      "script": {
        "inline": "if (doc['_type'].value=='Event') return doc['updated_at'].date.getMillis(); else if (doc['_type'].value=='City') {for(nestedObj in _source.city_events) {if(nestedObj.region_id==1) return nestedObj.updated_at.toLong();}}",
        "lang": "groovy"
      },
      "order": "desc"
    }
  }

Note that I haven't done all the proper checks in the Groovy script above (checking if there are actually nested objects in the document for example).

这篇关于弹性搜索:根据类型对不同的字段进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆