使用数据框的架构生成Spark Map数据框 [英] Spark map dataframe using the dataframe's schema

查看:58
本文介绍了使用数据框的架构生成Spark Map数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个从JSON对象创建的数据框.我可以查询此数据帧并将其写入镶木地板.

I have a dataframe, created from a JSON object. I can query this dataframe and write it to parquet.

自从我推断出架构以来,我不一定知道数据框中的内容.

Since I infer the schema, I don't necessarily know what's in the dataframe.

是否有办法列出列名称或使用其自己的模式映射数据框?

Is there a way to the the column names out or map the dataframe using its own schema?

// The results of SQL queries are DataFrames and support all the normal  RDD operations.
// The columns of a row in the result can be accessed by field index:
df.map(t => "Name: " + t(0)).collect().foreach(println)

// or by field name:
df.map(t => "Name: " + t.getAs[String]("name")).collect().foreach(println)

// row.getValuesMap[T] retrieves multiple columns at once into a Map[String, T]
df.map(_.getValuesMap[Any](List("name", "age"))).collect().foreach(println)
// Map("name" -> "Justin", "age" -> 19)

我想做类似的事情

df.map (_.getValuesMap[Any](ListAll())).collect().foreach(println)
// Map ("name" -> "Justin", "age" -> 19, "color" -> "red")

不知道列的实际数量或名称.

without knowing the actual amount or names of the columns.

推荐答案

嗯,可以,但是结果却毫无用处:

Well, you can but result is rather useless:

val df = Seq(("Justin", 19, "red")).toDF("name", "age", "color")

def getValues(row: Row, names: Seq[String]) = names.map(
  name => name -> row.getAs[Any](name)
).toMap

val names = df.columns
df.rdd.map(getValues(_, names)).first

// scala.collection.immutable.Map[String,Any] = 
//   Map(name -> Justin, age -> 19, color -> red)

要获得实际上有用的东西,可以在SQL类型和Scala类型之间进行适当的映射.在简单的情况下,这并不难,但在一般情况下却很难.例如,内置类型可用于表示任意 struct .可以使用一些元编程来做到这一点,但是可以说这并不值得大惊小怪.

To get something actually useful one would a proper mapping between SQL types and Scala types. It is not hard in simple cases but it is hard in general. For example there is built-in type which can be used to represent an arbitrary struct. This can be done using a little bit of meta-programming but arguably it is not worth all the fuss.

这篇关于使用数据框的架构生成Spark Map数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆