使用elasticsearch-spark连接器从Spark读取ES:返回所有字段 [英] Reading ES from spark with elasticsearch-spark connector: all the fields are returned
问题描述
我已经在带有Elasticsearch-Spark连接器的火花壳中进行了一些实验.调用火花:
I've done some experiments in the spark-shell with the elasticsearch-spark connector. Invoking spark:
] $SPARK_HOME/bin/spark-shell --master local[2] --jars ~/spark/jars/elasticsearch-spark-20_2.11-5.1.2.jar
在scala外壳程序中:
In the scala shell:
scala> import org.elasticsearch.spark._
scala> val es_rdd = sc.esRDD("myindex/mytype",query="myquery")
它运行良好,结果包含myquery中指定的良好记录.唯一的是,即使我在查询中指定了这些字段的子集,我也获得了所有字段.示例:
It works well, the result contains the good records as specified in myquery. The only thing is that I get all the fields, even if I specify a subset of these fields in the query. Example:
myquery = """{"query":..., "fields":["a","b"], "size":10}"""
返回所有字段,不仅返回a和b(顺便说一句,我注意到大小参数也不考虑在内:result包含10条以上的记录).也许添加字段是嵌套的很重要,a和b实际上是doc.a和doc.b.
returns all the fields, not only a and b (BTW, I noticed that size parameter is not taken in account neither : result contains more than 10 records). Maybe it's important to add that fields are nested, a and b are actually doc.a and doc.b.
是连接器中的错误还是语法错误?
Is it a bug in the connector or do I have the wrong syntax?
推荐答案
spark elasticsearch连接器使用fields
,因此您无法应用投影.
The spark elasticsearch connector uses fields
thus you cannot apply projection.
如果您希望对映射使用细粒度的控制,则应该使用DataFrame
,它基本上是RDD加架构.
If you wish to use fine-grained control over the mapping, you should be using DataFrame
instead which are basically RDDs plus schema.
pushdown
谓词,以将Spark SQL转换(下推)为Elasticsearch Query DSL.
pushdown
predicate should also be enabled to translate (push-down) Spark SQL into Elasticsearch Query DSL.
现在是一个半完整的示例:
Now a semi-full example :
myQuery = """{"query":..., """
val df = spark.read.format("org.elasticsearch.spark.sql")
.option("query", myQuery)
.option("pushdown", "true")
.load("myindex/mytype")
.limit(10) // instead of size
.select("a","b") // instead of fields
这篇关于使用elasticsearch-spark连接器从Spark读取ES:返回所有字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!