合并两列,转换两个时间戳并从df中选择不会导致任何错误,但将一列转换为时间戳并选择会导致错误 [英] Combining two columns, casting two timestamp and selecting from df causes no error, but casting one column to timestamp and selecting causes error
问题描述
当我尝试选择转换为unix_timestamp的列,然后再从数据帧添加时间戳时,出现sparkanalysisexception错误. 请参见下面的链接.
When I try to select a column that is cast to unix_timestamp and then timestamp from a dataframe there is a sparkanalysisexception error. See link below.
但是,当我合并两列,然后将组合转换为unix_timestamp,然后转换为timestamp类型,然后从df中选择时,没有错误.
However, when I combine two columns, and then cast the combo to a unix_timestamp and then timestamp type and then select from a df there is no error.
错误: 如何从日期字符串中提取年份?
没有错误
import org.apache.spark.sql.functions._
import org.apache.spark.sql._
import org.apache.spark.sql.types._
val spark: SparkSession = SparkSession.builder().
appName("myapp").master("local").getOrCreate()
case class Person(id: Int, date: String, time:String)
import spark.implicits._
val mydf: DataFrame = Seq(Person(1,"9/16/13", "11:11:11")).toDF()
//solution.show()
//column modificaton
val datecol: Column = mydf("date")
val timecol: Column = mydf("time")
val newcol: Column = unix_timestamp(concat(datecol,lit(" "),timecol),"MM/dd/yy").cast(TimestampType)
mydf.select(newcol).show()
结果
预期: 错误火花分析,无法在mydf中找到unix_timestamp(concat(....))
Results
Expected: Error-sparkanalysis, can't find unix_timestamp(concat(....)) in mydf
实际:
+------------------------------------------------------------------+
|CAST(unix_timestamp(concat(date, , time), MM/dd/yy) AS TIMESTAMP)|
+------------------------------------------------------------------+
| 2013-09-16 00:00:...|
推荐答案
这些情况似乎并不完全不同.在错误的情况下,您有一个具有更改的列名的新数据框.见下文:-
These do not seem disparate cases. In the erroneous case, you had a new dataframe with changed column names. See below :-
val select_df: DataFrame = mydf.select(unix_timestamp(mydf("date"),"MM/dd/yy").cast(TimestampType))
select_df.select(year($"date")).show()
在这里,select_df
数据框已将列名称从date
更改为类似cast(unix_timestamp(mydf("date"),"MM/dd/yy")) as Timestamp
Here, select_df
dataframe has changed column names from date
to something like cast(unix_timestamp(mydf("date"),"MM/dd/yy")) as Timestamp
在上述情况下,您只是在说一个新列时:-
While in the case mentioned above, you are just defining a new column when you say :-
val newcol: Column = unix_timestamp(concat(datecol,lit(" "),timecol),"MM/dd/yy").cast(TimestampType)
然后您可以使用它从数据框中进行选择,从而给出预期的结果.
And then you use this to select from your dataframe and thus it gives out expected results.
希望这使事情变得更清楚.
Hope this makes things clearer.
这篇关于合并两列,转换两个时间戳并从df中选择不会导致任何错误,但将一列转换为时间戳并选择会导致错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!