在Spark中使用复杂过滤从Elasticsearch获取esJsonRDD [英] Fetching esJsonRDD from elasticsearch with complex filtering in Spark

查看:678
本文介绍了在Spark中使用复杂过滤从Elasticsearch获取esJsonRDD的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在基于这样的单行弹性查询(示例)在Spark Job过滤中获取elasticsearch RDD:

I am currently fetching the elasticsearch RDD in our Spark Job filtering based on one-line elastic query as such (example):

val elasticRdds = sparkContext.esJsonRDD(esIndex, s"?default_operator=AND&q=director.name:DAVID + \n movie.name:SEVEN")

现在,如果我们的搜索查询变得复杂,例如:

Now if our search query becomes complex like:

{
    "query": {
        "filtered": {
            "query": {
                "query_string": {
                    "default_operator": "AND",
                    "query": "director.name:DAVID + \n movie.name:SEVEN"
                }
            },
            "filter": {
                "nested": {
                    "path": "movieStatus.boxoffice.status",
                    "query": {
                        "bool": {
                            "must": [
                                {
                                    "match": {
                                        "movieStatus.boxoffice.status.rating": "A"
                                    }
                                },
                                {
                                    "match": {
                                        "movieStatus.boxoffice.status.oscar": "false"
                                    }
                                }
                            ]
                        }
                    }
                }
           }
        }
    }
}

我仍然可以将该查询转换为嵌入式弹性查询,以与 esJsonRDD 一起使用吗?还是无论如何,上述查询仍可以按原样 esJsonRDD 一起使用? 如果没有,那么在Spark中获取此类RDD的更好方法是什么?

Can I still convert that query to in-line elastic query to use it with esJsonRDD? Or is there anyway that the above query could still be used as is with esJsonRDD? If not, what is the better way to fetch such RDDs in Spark?

因为esJsonRDD似乎只接受内联(一行)弹性查询.

Because esJsonRDD seems to accept only inline(one line) elastic queries.

推荐答案

使用三引号:

val query = """{
"query": {
    "filtered": {
        "query": {
            "query_string": {
                "default_operator": "AND",
                "query": "director.name:DAVID + \n movie.name:SEVEN"
            }
        },
        "filter": {
            "nested": {
                "path": "movieStatus.boxoffice.status",
                "query": {
                    "bool": {
                        "must": [
                            {
                                "match": {
                                    "movieStatus.boxoffice.status.rating": "A"
                                }
                            },
                            {
                                "match": {
                                    "movieStatus.boxoffice.status.oscar": "false"
                                }
                            }
                        ]
                    }
                }
            }
        }
     }
  }
}"""

val elasticRdds = sparkContext.esJsonRDD(esIndex, query)

这篇关于在Spark中使用复杂过滤从Elasticsearch获取esJsonRDD的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆