在Spark DataFrame-Scala中格式化TimestampType [英] Format TimestampType in spark DataFrame- Scala

查看:654
本文介绍了在Spark DataFrame-Scala中格式化TimestampType的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

虽然我尝试将字符串字段强制转换为Spark DataFrame中的TimestampType,但输出值的精度为微秒(yyyy-MM-dd HH:mm:ss.S).但是我需要格式为yyyy-MM-dd HH:mm:ss,即不包括微秒精度.另外,我想在写入镶木地板文件时将其另存为时间戳字段. 因此,我字段的数据类型应为格式为yyyy-MM-dd HH:mm:ss

While I try to cast a string field to a TimestampType in Spark DataFrame, the output value is coming with microsecond precision( yyyy-MM-dd HH:mm:ss.S). But I need the format to be yyyy-MM-dd HH:mm:ss ie., excluding the microsecond precision. Also, I want to save this as a time stamp field while writing into a parquet file. So the datatype of my field should be a timestamp of format yyyy-MM-dd HH:mm:ss

我尝试使用TimestampType作为

I tried using TimestampType as

col("column_A").cast(TimestampType)
or
col("column_A").cast("timestamp")

将字段强制转换为时间戳.它们能够将字段强制转换为时间戳,但精度为微秒.

to cast the field to timestamp. These are able to cast the field to timestamp but with the microsecond precision.

任何人都可以使用所需的格式规范来帮助将时间戳记数据类型保存到镶木地板文件中吗?
编辑
输入:

Can anyone help in saving the timestamp datatype to parquet file with the required format specification.
EDIT
Input:

val a = sc.parallelize(List(("a", "2017-01-01 12:02:00.0"), ("b", "2017-02-01 11:22:30"))).toDF("cola", "colb")
scala> a.withColumn("datetime", date_format(col("colb"), "yyyy-MM-dd HH:mm:ss")).show(false)
+----+---------------------+-------------------+
|cola|colb                 |datetime           |
+----+---------------------+-------------------+
|a   |2017-01-01 12:02:00.0|2017-01-01 12:02:00|
|b   |2017-02-01 11:22:30  |2017-02-01 11:22:30|
+----+---------------------+-------------------+


scala> a.withColumn("datetime", date_format(col("colb"), "yyyy-MM-dd HH:mm:ss")).printSchema
root
 |-- cola: string (nullable = true)
 |-- colb: string (nullable = true)
 |-- datetime: string (nullable = true)

在上面,我们获得了正确的时间戳格式,但是当我们打印模式时,datetime字段的类型为String,但是我在这里需要一个时间戳类型.

In the above, we are getting the right timestamp format, but when we print the Schema, the datetime field is of type String, but I need a timestamp type here.

现在,如果我尝试将字段强制转换为时间戳,则格式将设置为微秒精度,这是不希望的.

Now,if I attempt to cast the field to timestamp, the format is set to microsecond precision, which is not intended.

scala> import org.apache.spark.sql.types._
import org.apache.spark.sql.types._

scala> val a = sc.parallelize(List(("a", "2017-01-01 12:02:00.0"), ("b", "2017-02-01 11:22:30"))).toDF("cola", "colb")
a: org.apache.spark.sql.DataFrame = [cola: string, colb: string]

scala> a.withColumn("datetime", date_format(col("colb").cast(TimestampType), "yyyy-MM-dd HH:mm:ss").cast(TimestampType)).show(false)
+----+---------------------+---------------------+
|cola|colb                 |datetime             |
+----+---------------------+---------------------+
|a   |2017-01-01 12:02:00.0|2017-01-01 12:02:00.0|
|b   |2017-02-01 11:22:30  |2017-02-01 11:22:30.0|
+----+---------------------+---------------------+


scala> a.withColumn("datetime", date_format(col("colb").cast(TimestampType), "yyyy-MM-dd HH:mm:ss").cast(TimestampType)).printSchema
root
 |-- cola: string (nullable = true)
 |-- colb: string (nullable = true)
 |-- datetime: timestamp (nullable = true)

我期望格式为yyyy-MM-dd HH:mm:ss,并且字段的数据类型为timestamp 预先感谢

What I am expecting is for the format to be in yyyy-MM-dd HH:mm:ss and also the datatype of the field to be of timestamp Thanks in advance

推荐答案

您可以使用

You can use unix_timestamp to convert the string date time to timestamp.

unix_timestamp(Column s, String p)转换给定的时间字符串 模式(请参见 [ http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html ]) 到Unix时间戳(以秒为单位),如果失败则返回null.

unix_timestamp(Column s, String p) Convert time string with given pattern (see [http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html ]) to Unix time stamp (in seconds), return null if fail.

val format = "yyyy-MM-dd HH:mm:ss"
dataframe.withColumn("column_A", unix_timestamp($"date", format))

希望这会有所帮助!

这篇关于在Spark DataFrame-Scala中格式化TimestampType的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆