使用 elasticsearch-spark 连接器从 spark 读取 ES:返回所有字段 [英] Reading ES from spark with elasticsearch-spark connector: all the fields are returned
问题描述
我使用 elasticsearch-spark 连接器在 spark-shell 中做了一些实验.调用火花:
I've done some experiments in the spark-shell with the elasticsearch-spark connector. Invoking spark:
] $SPARK_HOME/bin/spark-shell --master local[2] --jars ~/spark/jars/elasticsearch-spark-20_2.11-5.1.2.jar
在 Scala 外壳中:
In the scala shell:
scala> import org.elasticsearch.spark._
scala> val es_rdd = sc.esRDD("myindex/mytype",query="myquery")
效果很好,结果包含了 myquery 中指定的好记录.唯一的事情是我得到了所有的字段,即使我在查询中指定了这些字段的一个子集.示例:
It works well, the result contains the good records as specified in myquery. The only thing is that I get all the fields, even if I specify a subset of these fields in the query. Example:
myquery = """{"query":..., "fields":["a","b"], "size":10}"""
返回所有字段,不仅是 a 和 b (顺便说一句,我注意到 size 参数也没有考虑在内:结果包含超过 10 条记录).也许重要的是添加字段是嵌套的,a 和 b 实际上是 doc.a 和 doc.b.
returns all the fields, not only a and b (BTW, I noticed that size parameter is not taken in account neither : result contains more than 10 records). Maybe it's important to add that fields are nested, a and b are actually doc.a and doc.b.
是连接器中的错误还是我的语法错误?
Is it a bug in the connector or do I have the wrong syntax?
推荐答案
spark elasticsearch 连接器使用 fields
因此你不能应用投影.
The spark elasticsearch connector uses fields
thus you cannot apply projection.
如果您希望对映射使用细粒度控制,则应该使用 DataFrame
代替,它们基本上是 RDD 加模式.
If you wish to use fine-grained control over the mapping, you should be using DataFrame
instead which are basically RDDs plus schema.
pushdown
谓词以将(下推)Spark SQL 转换为 Elasticsearch 查询 DSL.
pushdown
predicate should also be enabled to translate (push-down) Spark SQL into Elasticsearch Query DSL.
现在是一个半完整的例子:
Now a semi-full example :
myQuery = """{"query":..., """
val df = spark.read.format("org.elasticsearch.spark.sql")
.option("query", myQuery)
.option("pushdown", "true")
.load("myindex/mytype")
.limit(10) // instead of size
.select("a","b") // instead of fields
这篇关于使用 elasticsearch-spark 连接器从 spark 读取 ES:返回所有字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!