ElasticSearch:聚集在_score字段w / Groovy禁用 [英] ElasticSearch: aggregation on _score field w/ Groovy disabled

查看:204
本文介绍了ElasticSearch:聚集在_score字段w / Groovy禁用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我看过的每个例子(例如, ElasticSearch:_score字段上的聚合?)用于在_score字段上或与_score字段相关的聚合似乎需要使用脚本。由于出于安全原因,ElasticSearch默认禁用动态脚本,有没有办法完成此操作,而无需将脚本文件加载到每个ES节点或重新启用动态脚本?



我的原始聚合如下所示:

 aggs:{
terms_agg:{
terms:{
field:field1,
order:{max_score:desc}
},
aggs
max_score:{
max:{script:_score}
},
top_terms:{
top_hits size:1}
}
}
}

尝试指定表达式作为lang似乎不起作用ES抛出一个错误,表示只能在使用排序时访问该分数。我无法找出任何其他的方法来排序我的水桶的得分字段。任何人都有任何想法?



编辑:为了澄清,我的限制是无法修改服务器端。即,我不能添加或编辑任何东西作为ES安装或配置的一部分。

解决方案

ElasticSearch至少1.7.1版本并且可能早期还提供了使用Lucene的Expression脚本语言 - 并且由于默认情况下,表达式被沙盒化,因此它可以与Groovy的方式大致相同。在我们的例子中,我们的ES集群刚刚从1.4.1升级到1.7.1,我们决定不再使用Groovy,因为它是非沙盒式的,尽管我们还是想使用动态脚本,因为当我们继续微调我们的应用程序和搜索层时,它们的易用性和灵活性。



在编写本机Java脚本作为我们的动态Groovy功能分数在本例中也可能是一种可能性,我们想看看使用Expression的动态在线脚本语言的可行性。阅读完文档后,我发现我们只需将lang属性从groovy更改为expression在我们的内联 function_score 脚本和 script.inline:sandbox ... / config / elasticsearch.yml 文件 - 功能分析脚本没有任何其他修改工作。因此,我们现在可以继续在ElasticSearch中使用动态内联脚本,并且启用沙箱(因为默认情况下是表达式沙盒)。显然,还应该实施其他安全措施,例如在应用程序代理和防火墙后面运行ES群集,以确保外部用户无法直接访问ES节点或ES API。然而,这是一个非常简单的变化,现在已经解决了Groovy缺乏沙箱的问题,以及使其无沙箱运行的问题。



切换您的表达式的动态脚本只能在某些情况下起作用或适用(取决于内联动态脚本的复杂性),似乎值得分享这些信息,希望可以帮助其他开发人员。



注意,其他支持的ES脚本语言之一,只有在您的搜索查询中创建模板似乎可以使用。它似乎不适用于任何更复杂的脚本需求,如 function_score 等,虽然我不知道这在第一次阅读期间是完全明显的更新的ES文档。



最后,需要注意的另一个问题是使用Lucene Expression脚本被标记为实验功能在最新的ES版本和文档中,笔记提示,由于此脚本扩展在此时正在进行重要的开发工作,因此其使用或功能可能会在ES的更高版本中更改。因此,如果您切换到任何脚本(动态或其他方式)使用Expression,则应在文档/开发人员注释中注明,以便在下次升级ES安装之前重新访问这些更改,以确保脚本保持兼容性对于我们至少的情况,除非我们愿意允许在最新版本的ES中再次启用非沙盒动态脚本(通过 script.inline:on 选项),以便内联Groovy脚本可以继续运行,切换到Lucene Expression脚本似乎是现在最好的选择。



有趣的是,在将来的版本中,ES的脚本选择发生了什么变化,特别是考虑到Groovy的(显然无效的)沙箱选项将被2.0版完全删除。希望其他的保护措施能够实现动态Groovy的使用,或者Lucene Expression脚本将采用Groovy的位置,并且将启用开发人员已经使用的所有类型的动态脚本。



有关Lucene Expression的更多说明,请参阅ES文档: https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-scripting.html#_lucene_expressions_scripts - 此页面也是有关计划的说明的来源从ES v2.0 +中删除Groovy的沙箱选项。更多Lucene表达文档可以在这里找到: http://lucene.apache.org/core/4_9_0/expressions/index.html?org/apache/lucene/expressions/js/package-summary.html


Every example I've seen (e.g., ElasticSearch: aggregation on _score field?) for doing aggregations on or related to the _score field seems to require the usage of scripting. With ElasticSearch disabling dynamic scripting by default for security reasons, is there any way to accomplish this without resorting to loading a script file onto every ES node or re-enabling dynamic scripting?

My original aggregation looked like the following:

"aggs": {
    "terms_agg": {
        "terms": {
            "field": "field1",
            "order": {"max_score": "desc"}
        },
     "aggs": {
         "max_score": {
             "max": {"script": "_score"}
         },
         "top_terms": {
             "top_hits": {"size": 1}
         }
      }
}

Trying to specify expression as the lang doesn't seem to work as ES throws an error stating the score can only be accessed when being used to sort. I can't figure out any other method of ordering my buckets by the score field. Anyone have any ideas?

Edit: To clarify, my restriction is not being able to modify the server-side. I.e., I cannot add or edit anything as part of the ES installation or configuration.

解决方案

ElasticSearch at least of version 1.7.1 and possibly earlier also offers the use of Lucene's Expression scripting language – and as Expression is sandboxed by default it can be used for dynamic inline scripts in much the same way that Groovy was. In our case, where our production ES cluster has just been upgraded from 1.4.1 to 1.7.1, we decided not to use Groovy anymore because of it's non-sandboxed nature, although we really still want to make use of dynamic scripts because of the ease of deployment and the flexibility they offer as we continue to fine-tune our application and its search layer.

While writing a native Java script as a replacement for our dynamic Groovy function scores may have also have been a possibility in our case, we wanted to look at the feasibility of using Expression for our dynamic inline scripting language instead. After reading through the documentation, I found that we were simply able to change the "lang" attribute from "groovy" to "expression" in our inline function_score scripts and with the script.inline: sandbox property set in the .../config/elasticsearch.yml file – the function score script worked without any other modification. As such, we can now continue to use dynamic inline scripting within ElasticSearch, and do so with sandboxing enabled (as Expression is sandboxed by default). Obviously other security measures such as running your ES cluster behind an application proxy and firewall should also be implemented to ensure that outside users have no direct access to your ES nodes or the ES API. However, this was a very simple change, that for now has solved a problem with Groovy's lack of sandboxing and the concerns over enabling it to run without sandboxing.

While switching your dynamic scripts to Expression may only work or be applicable in some cases (depending on the complexity of your inline dynamic scripts), it seemed it was worth sharing this information in the hopes it could help other developers.

As a note, one of the other supported ES scripting languages, Mustache only appears to be usable for creating templates within your search queries. It does not appear to be usable for any of the more complexing scripting needs such as function_score, etc., although I am not sure this was entirely apparent during the first read through of the updated ES documentation.

Lastly, one further issue to be mindful of is that the use of Lucene Expression scripts are marked as an experimental feature in the latest ES release and the documentation notes that as this scripting extension is undergoing significant development work at this time, its usage or functionality may change in later versions of ES. Thus if you do switch over to using Expression for any of your scripts (dynamic or otherwise), it should be noted in your documentation/developer notes to revisit these changes before upgrading your ES installation next time to ensure your scripts remain compatible and work as expected.

For our situation at least, unless we were willing to allow non-sandboxed dynamic scripting to be enabled again in the latest version of ES (via the script.inline: on option) so that inline Groovy scripts could continue to run, switching over to Lucene Expression scripting seemed like the best option for now.

It will be interesting to see what changes occur to the scripting choices for ES in future releases, especially given that the (apparently ineffective) sandboxing option for Groovy will be completely removed by version 2.0. Hopefully other protections can be put in place to enable dynamic Groovy usage, or perhaps Lucene Expression scripting will take Groovy's place and will enable all the types of dynamic scripting that developers are already making use of.

For more notes on Lucene Expression see the ES documentation here: https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-scripting.html#_lucene_expressions_scripts – this page is also the source of the note regarding the planned removal of Groovy's sandboxing option from ES v2.0+. Further Lucene Expression documentation can be found here: http://lucene.apache.org/core/4_9_0/expressions/index.html?org/apache/lucene/expressions/js/package-summary.html

这篇关于ElasticSearch:聚集在_score字段w / Groovy禁用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆