使用elasticsearch-spark连接器从Spark读取ES:返回所有字段 [英] Reading ES from spark with elasticsearch-spark connector: all the fields are returned

查看:1432
本文介绍了使用elasticsearch-spark连接器从Spark读取ES:返回所有字段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在带有Elasticsearch-Spark连接器的火花壳中进行了一些实验.调用火花:

I've done some experiments in the spark-shell with the elasticsearch-spark connector. Invoking spark:

] $SPARK_HOME/bin/spark-shell --master local[2] --jars ~/spark/jars/elasticsearch-spark-20_2.11-5.1.2.jar

在scala外壳程序中:

In the scala shell:

scala> import org.elasticsearch.spark._
scala> val es_rdd = sc.esRDD("myindex/mytype",query="myquery")

它运行良好,结果包含myquery中指定的良好记录.唯一的是,即使我在查询中指定了这些字段的子集,我也获得了所有字段.示例:

It works well, the result contains the good records as specified in myquery. The only thing is that I get all the fields, even if I specify a subset of these fields in the query. Example:

myquery = """{"query":..., "fields":["a","b"], "size":10}"""

返回所有字段,不仅返回a和b(顺便说一句,我注意到大小参数也不考虑在内:result包含10条以上的记录).也许添加字段是嵌套的很重要,a和b实际上是doc.a和doc.b.

returns all the fields, not only a and b (BTW, I noticed that size parameter is not taken in account neither : result contains more than 10 records). Maybe it's important to add that fields are nested, a and b are actually doc.a and doc.b.

是连接器中的错误还是语法错误?

Is it a bug in the connector or do I have the wrong syntax?

推荐答案

spark elasticsearch连接器使用fields,因此您无法应用投影.

The spark elasticsearch connector uses fields thus you cannot apply projection.

如果您希望对映射使用细粒度的控制,则应该使用DataFrame,它基本上是RDD加架构.

If you wish to use fine-grained control over the mapping, you should be using DataFrame instead which are basically RDDs plus schema.

pushdown谓词,以将Spark SQL转换(下推)为Elasticsearch Query DSL.

pushdown predicate should also be enabled to translate (push-down) Spark SQL into Elasticsearch Query DSL.

现在是一个半完整的示例:

Now a semi-full example :

myQuery = """{"query":..., """
val df = spark.read.format("org.elasticsearch.spark.sql")
                     .option("query", myQuery)
                     .option("pushdown", "true")
                     .load("myindex/mytype")
                     .limit(10) // instead of size
                     .select("a","b") // instead of fields

这篇关于使用elasticsearch-spark连接器从Spark读取ES:返回所有字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆