在scala中火花rdd正确的日期格式? [英] Spark rdd correct date format in scala?

查看:310
本文介绍了在scala中火花rdd正确的日期格式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是将RDD转换为Dataframe时要使用的日期值。

This is the date value I want to use when I convert RDD to Dataframe.

Sun Jul 31 10:21:53 PDT 2016

此架构 DataTypes.DateType会引发错误。

This schema "DataTypes.DateType" throws an error.

java.util.Date is not a valid external type for schema of date

的架构的有效外部类型

所以我想提前准备RDD,以便上面的模式可以工作。
如何校正日期格式以使其转换为数据框?

So I want to prepare RDD in advance in such a way that above schema can work. How can I correct the date format to work in conversion to dataframe?

//Schema for data frame
val schema =
                StructType(
                    StructField("lotStartDate", DateType, false) ::
                    StructField("pm", StringType, false) ::
                    StructField("wc", LongType, false) ::
                    StructField("ri", StringType, false) :: Nil)

// rowrdd : [Sun Jul 31 10:21:53 PDT 2016,"PM",11,"ABC"]
val df = spark.createDataFrame(rddRow,schema)


推荐答案

Spark的 DateType 可以从 java.sql.Date ,因此您应该将输入的RDD转换为使用该类型,例如:

Spark's DateType can be encoded from java.sql.Date, so you should convert your input RDD to use that type, e.g.:

val inputRdd: RDD[(Int, java.util.Date)] = ??? // however it's created

// convert java.util.Date to java.sql.Date:
val fixedRdd = inputRdd.map {
  case (id, date) => (id, new java.sql.Date(date.getTime))
}

// now you can convert to DataFrame given your schema:
val schema = StructType(
  StructField("id", IntegerType) :: 
  StructField("date", DateType) :: 
  Nil
)

val df = spark.createDataFrame(
  fixedRdd.map(record => Row.fromSeq(record.productIterator.toSeq)),
  schema
)

// or, even easier - let Spark figure out the schema:
val df2 = fixedRdd.toDF("id", "date")

// both will evaluate to the same schema, in this case

这篇关于在scala中火花rdd正确的日期格式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆