Spark Scala-将嵌套的StructType转换为Map [英] Spark scala - Nested StructType conversion to Map

查看:313
本文介绍了Spark Scala-将嵌套的StructType转换为Map的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在Scala中使用Spark 1.6.

I am using Spark 1.6 in scala.

我在ElasticSearch中用对象创建了一个索引.对象"params"被创建为Map [String,Map [String,String]].示例:

I created an index in ElasticSearch with an object. The object "params" was created as a Map[String, Map[String, String]]. Example:

val params : Map[String, Map[String, String]] = ("p1" -> ("p1_detail" -> "table1"), "p2" -> (("p2_detail" -> "table2"), ("p2_filter" -> "filter2")), "p3" -> ("p3_detail" -> "table3"))

这给了我类似以下的记录:

That gives me records that look like the following:

{
        "_index": "x",
        "_type": "1",
        "_id": "xxxxxxxxxxxx",
        "_score": 1,
        "_timestamp": 1506537199650,
        "_source": {
           "a": "toto",
           "b": "tata",
           "c": "description",
           "params": {
              "p1": {
                 "p1_detail": "table1"
              },
              "p2": {
                 "p2_detail": "table2",
                 "p2_filter": "filter2"
              },
              "p3": {
                 "p3_detail": "table3"
              }
           }
        }
     },

然后,我尝试读取Elasticsearch索引以更新值.

Then I am trying to read the Elasticsearch index in order to update the values.

Spark使用以下模式读取索引:

Spark reads the index with the following schema:

|-- a: string (nullable = true)
|-- b: string (nullable = true)
|-- c: string (nullable = true)
|-- params: struct (nullable = true)
|    |-- p1: struct (nullable = true)
|    |    |-- p1_detail: string (nullable = true)
|    |-- p2: struct (nullable = true)
|    |    |-- p2_detail: string (nullable = true)
|    |    |-- p2_filter: string (nullable = true)
|    |-- p3: struct (nullable = true)
|    |    |-- p3_detail: string (nullable = true)

我的问题是该对象被读取为结构.为了管理和轻松更新字段,我想拥有一个Map,因为我对StructType不太熟悉.

My problem is that the object is read as a struct. In order to manage and easily update the fields I want to have a Map as I am not very familiar with StructType.

我尝试将UDF中的对象作为地图获取,但是出现以下错误:

I tried to get the object in a UDF as a Map but I have the following error:

 User class threw exception: org.apache.spark.sql.AnalysisException: cannot resolve 'UDF(params)' due to data type mismatch: argument 1 requires map<string,map<string,string>> type, however, 'params' is of struct<p1:struct<p1_detail:string>,p2:struct<p2_detail:string,p2_filter:string>,p3:struct<p3_detail:string>> type.;

UDF代码段:

val getSubField : Map[String, Map[String, String]] => String = (params : Map[String, Map[String, String]]) => { val return_string = (params ("p1") getOrElse("p1_detail", null.asInstanceOf[String]) return_string }

我的问题:如何将这个Struct转换为Map?我已经看过文档中提供的toMap方法,但是由于我是Scala初学者,所以找不到使用它的方法(对隐式参数不是很熟悉).

My question: How can we convert this Struct to a Map? I already read saw the toMap method available in the documentation but can not find how to use it (not very familiar with implicit parameters) as I am a scala beginner.

预先感谢

推荐答案

我终于如下解决了这个问题:

I finally solved it as follows:

def convertRowToMap[T](row : Row) : Map[String, T] = {
  row.schema.fieldNames.filter(field => !row.isNullAt(row.fieldIndex(field))).map(field => field -> row.getAs[T](field)).toMap
}

/* udf that converts Row to Map */
  val rowToMap : Row => Map[String, Map[String, String]] = (row:Row) => {
  val map_temp = convertRowToMap[Row](row)

  val map_to_return = map_temp.map{case(k,v) => k -> convertRowToMap[String](v)}

    map_to_return   
}
  val udfrowToMap = udf(rowToMap)

这篇关于Spark Scala-将嵌套的StructType转换为Map的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆