转换格式为"MM/dd/yyyy HH:mm"的字符串;在Spark中的数据帧中乔达日期时间 [英] Convert string with form "MM/dd/yyyy HH:mm" to joda datetime in dataframe in Spark
问题描述
我正在读取csv文件,其中一列中的字符串应转换为日期时间.字符串的格式为MM/dd/yyyy HH:mm
.但是,当我尝试使用joda-time转换它时,总是出现错误:
I'm reading in csv-files with in one column a string that should be converted to a datetime. The string is in the form MM/dd/yyyy HH:mm
. However when I try to transform this using joda-time, I always get the error:
线程"main"中的异常java.lang.UnsupportedOperationException:不支持org.joda.time.DateTime类型的架构
Exception in thread "main" java.lang.UnsupportedOperationException: Schema for type org.joda.time.DateTime is not supported
我不知道到底是什么问题...
I don't know what exactly the problem is...
val input = c.textFile("C:\\Users\\AAPL.csv").map(_.split(",")).map{p =>
val formatter: DateTimeFormatter = DateTimeFormat.forPattern("MM/dd/yyyy HH:mm");
val date: DateTime = formatter.parseDateTime(p(0));
StockData(date, p(1).toDouble, p(2).toDouble, p(3).toDouble, p(4).toDouble, p(5).toInt, p(6).toInt)
}.toDF()
有人可以帮忙吗?
推荐答案
我不知道到底是什么问题...
I don't know what exactly the problem is...
好吧,问题的根源在很大程度上由一条错误消息来描述. Spark SQL不支持Joda-Time DateTime
作为输入.日期字段的有效输入为java.sql.Date
(请参见 Spark SQL和DataFrame指南,数据类型以供参考).
Well, the source of the problem is pretty much described by an error message. Spark SQL doesn't support Joda-Time DateTime
as an input. A valid input for a date field is java.sql.Date
(see Spark SQL and DataFrame Guide, Data Types for reference).
最简单的解决方案是调整StockData
类,以便将java.sql.Data
作为参数并替换:
The simplest solution is to adjust StockData
class so it takes java.sql.Data
as an argument and replace:
val date: DateTime = formatter.parseDateTime(p(0))
具有类似这样的内容:
val date: java.sql.Date = new java.sql.Date(
formatter.parseDateTime(p(0)).getMillis)
或
val date: java.sql.Timestamp = new java.sql.Timestamp(
formatter.parseDateTime(p(0)).getMillis)
如果要保留小时/分钟.
if you want to preserve hour / minutes.
如果考虑将window函数与range子句一起使用,更好的选择是将字符串传递给DataFrame并将其转换为整数时间戳:
If you think about using window functions with range clause a better option is to pass string to a DataFrame and convert it to an integer timestamp:
import org.apache.spark.sql.functions.unix_timestamp
df.withColumn("ts", unix_timestamp($"date", "MM/dd/yyyy HH:mm"))
有关详细信息,请参见火花窗口功能-日期之间的range .
See Spark Window Functions - rangeBetween dates for details.
这篇关于转换格式为"MM/dd/yyyy HH:mm"的字符串;在Spark中的数据帧中乔达日期时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!