使用elasticsearch-spark连接器从Spark读取ES:返回所有字段 [英] Reading ES from spark with elasticsearch-spark connector: all the fields are returned

查看：1432 发布时间：2020/9/4 9:03:07 scala apache-spark elasticsearch apache-spark-sql

本文介绍了使用elasticsearch-spark连接器从Spark读取ES:返回所有字段的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我已经在带有Elasticsearch-Spark连接器的火花壳中进行了一些实验.调用火花:

I've done some experiments in the spark-shell with the elasticsearch-spark connector. Invoking spark:

] $SPARK_HOME/bin/spark-shell --master local[2] --jars ~/spark/jars/elasticsearch-spark-20_2.11-5.1.2.jar

在scala外壳程序中:

In the scala shell:

scala> import org.elasticsearch.spark._
scala> val es_rdd = sc.esRDD("myindex/mytype",query="myquery")

它运行良好，结果包含myquery中指定的良好记录.唯一的是，即使我在查询中指定了这些字段的子集，我也获得了所有字段.示例:

It works well, the result contains the good records as specified in myquery. The only thing is that I get all the fields, even if I specify a subset of these fields in the query. Example:

myquery = """{"query":..., "fields":["a","b"], "size":10}"""

返回所有字段，不仅返回a和b(顺便说一句，我注意到大小参数也不考虑在内:result包含10条以上的记录).也许添加字段是嵌套的很重要，a和b实际上是doc.a和doc.b.

returns all the fields, not only a and b (BTW, I noticed that size parameter is not taken in account neither : result contains more than 10 records). Maybe it's important to add that fields are nested, a and b are actually doc.a and doc.b.

是连接器中的错误还是语法错误?

Is it a bug in the connector or do I have the wrong syntax?

推荐答案

spark elasticsearch连接器使用fields，因此您无法应用投影.

The spark elasticsearch connector uses fields thus you cannot apply projection.

如果您希望对映射使用细粒度的控制，则应该使用DataFrame，它基本上是RDD加架构.

If you wish to use fine-grained control over the mapping, you should be using DataFrame instead which are basically RDDs plus schema.

pushdown谓词，以将Spark SQL转换(下推)为Elasticsearch Query DSL.

pushdown predicate should also be enabled to translate (push-down) Spark SQL into Elasticsearch Query DSL.

现在是一个半完整的示例:

Now a semi-full example :

myQuery = """{"query":..., """
val df = spark.read.format("org.elasticsearch.spark.sql")
                     .option("query", myQuery)
                     .option("pushdown", "true")
                     .load("myindex/mytype")
                     .limit(10) // instead of size
                     .select("a","b") // instead of fields

这篇关于使用elasticsearch-spark连接器从Spark读取ES:返回所有字段的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用elasticsearch-spark连接器从Spark读取ES:返回所有字段 [英] Reading ES from spark with elasticsearch-spark connector: all the fields are returned

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用elasticsearch-spark连接器从Spark读取ES:返回所有字段 [英] Reading ES from spark with elasticsearch-spark connector: all the fields are returned

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭