Scala:使用Spark读取Elasticsearch中的数组值 [英] Scala : Read Array value in Elasticsearch with Spark

查看:548
本文介绍了Scala:使用Spark读取Elasticsearch中的数组值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从Elasticsearch读取数据,但是我要读取的文档包含一个嵌套数组(我想读取)。

I am trying to read datas from Elasticsearch, but the document I want to read contains a nested array (that I want to read).

我包括了该选项 es.read.field.as.array.include的格式如下:

I included the option "es.read.field.as.array.include" in the following way :

val dataframe = reader
            .option("es.read.field.as.array.include","arrayField")
            .option("es.query", "someQuery")
            .load("Index/Document")

但是出现错误

java.lang.ClassCastException: scala.collection.convert.Wrappers$JListWrapper cannot be cast to java.lang.Float

我应该如何映射数组以读取它?

How should I map my array to read it ?

来自ES的数据示例:

{
    "_index": "Index",
    "_type": "Document",
    "_id": "ID",
    "_score": 1,
    "_source": {
        "currentTime": 1516211640000,
        "someField": someValue,
        "arrayField": [
        {
            "id": "000",
            "field1": 14,
            "field2": 20.23871387052084,
            "innerArray": [[ 55.2754,25.1909],[ 55.2754,25.190929],[ 55.27,25.190]]
        }, ...
        ],
    "meanError": 0.3082,

    }
}


推荐答案

您的样本数据内部-array需要为2个数组列

Your sample data inner-array need to be 2 array columns

您可以尝试此采样

val es = spark.read.format("org.elasticsearch.spark.sql")
  .option("es.read.field.as.array.include","arrayField,arrayField.innerArray:2")
  .option("es.query", "someQuery")
  .load("Index/Document")

 |-- arrayField: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- field1: long (nullable = true)
 |    |    |-- field2: float (nullable = true)
 |    |    |-- id: string (nullable = true)
 |    |    |-- innerArray: array (nullable = true)
 |    |    |    |-- element: array (containsNull = true)
 |    |    |    |    |-- element: float (containsNull = true)
 |-- currentTime: long (nullable = true)
 |-- meanError: float (nullable = true)
 |-- someField: string (nullable = true)


 +--------------------+-------------+---------+---------+
 |          arrayField|  currentTime|meanError|someField|
 +--------------------+-------------+---------+---------+
 |[[14,20.238714,00...|1516211640000|   0.3082|someValue|
 +--------------------+-------------+---------+---------+

这篇关于Scala:使用Spark读取Elasticsearch中的数组值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆