转换格式为"MM/dd/yyyy HH:mm"的字符串;在Spark中的数据帧中乔达日期时间 [英] Convert string with form "MM/dd/yyyy HH:mm" to joda datetime in dataframe in Spark

查看:242
本文介绍了转换格式为"MM/dd/yyyy HH:mm"的字符串;在Spark中的数据帧中乔达日期时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在读取csv文件,其中一列中的字符串应转换为日期时间.字符串的格式为MM/dd/yyyy HH:mm.但是,当我尝试使用joda-time转换它时,总是出现错误:

I'm reading in csv-files with in one column a string that should be converted to a datetime. The string is in the form MM/dd/yyyy HH:mm. However when I try to transform this using joda-time, I always get the error:

线程"main"中的异常java.lang.UnsupportedOperationException:不支持org.joda.time.DateTime类型的架构

Exception in thread "main" java.lang.UnsupportedOperationException: Schema for type org.joda.time.DateTime is not supported

我不知道到底是什么问题...

I don't know what exactly the problem is...

 val input = c.textFile("C:\\Users\\AAPL.csv").map(_.split(",")).map{p => 
      val formatter: DateTimeFormatter = DateTimeFormat.forPattern("MM/dd/yyyy HH:mm");
      val date: DateTime = formatter.parseDateTime(p(0));
      StockData(date, p(1).toDouble, p(2).toDouble, p(3).toDouble, p(4).toDouble, p(5).toInt, p(6).toInt)
}.toDF()

有人可以帮忙吗?

推荐答案

我不知道到底是什么问题...

I don't know what exactly the problem is...

好吧,问题的根源在很大程度上由一条错误消息来描述. Spark SQL不支持Joda-Time DateTime作为输入.日期字段的有效输入为java.sql.Date(请参见 Spark SQL和DataFrame指南,数据类型以供参考).

Well, the source of the problem is pretty much described by an error message. Spark SQL doesn't support Joda-Time DateTime as an input. A valid input for a date field is java.sql.Date (see Spark SQL and DataFrame Guide, Data Types for reference).

最简单的解决方案是调整StockData类,以便将java.sql.Data作为参数并替换:

The simplest solution is to adjust StockData class so it takes java.sql.Data as an argument and replace:

val date: DateTime = formatter.parseDateTime(p(0))

具有类似这样的内容:

val date: java.sql.Date = new java.sql.Date(
  formatter.parseDateTime(p(0)).getMillis)

val date: java.sql.Timestamp = new java.sql.Timestamp(
  formatter.parseDateTime(p(0)).getMillis)

如果要保留小时/分钟.

if you want to preserve hour / minutes.

如果考虑将window函数与range子句一起使用,更好的选择是将字符串传递给DataFrame并将其转换为整数时间戳:

If you think about using window functions with range clause a better option is to pass string to a DataFrame and convert it to an integer timestamp:

import org.apache.spark.sql.functions.unix_timestamp

df.withColumn("ts", unix_timestamp($"date", "MM/dd/yyyy HH:mm"))

有关详细信息,请参见火花窗口功能-日期之间的range .

See Spark Window Functions - rangeBetween dates for details.

这篇关于转换格式为"MM/dd/yyyy HH:mm"的字符串;在Spark中的数据帧中乔达日期时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆