如何从日期字符串中提取年份? [英] How to extract year from a date string?
问题描述
我使用的是 spark 2.1.2.
I'm using spark 2.1.2.
我正在处理日期时间数据,并想使用 spark sql 函数从 dt
字符串中获取年份.
I'm working with datetime data, and would like to get the year from a dt
string using spark sql functions.
我使用的代码如下:
import org.apache.spark.sql.functions._
import org.apache.spark.sql._
import org.apache.spark.sql.types._
val spark: SparkSession = SparkSession.builder().
appName("myapp").master("local").getOrCreate()
case class Person(id: Int, date: String)
import spark.implicits._
val mydf: DataFrame = Seq(Person(1,"9/16/13")).toDF()
val select_df: DataFrame = mydf.select(unix_timestamp(mydf("date"),"MM/dd/yy").cast(TimestampType))
select_df.select(year($"date")).show()
在上面的示例中,我希望日期的年份为 13
.
I expect the year of the date as 13
in the example above.
实际:org.apache.spark.sql.AnalysisException:无法解析date
"给定的输入列:[CAST(unix_timestamp(date, MM/dd/yy) 作为时间戳)];;'项目 [年('日期)AS 年(日期)#11]
Actual: org.apache.spark.sql.AnalysisException: cannot resolve 'date
' given input columns: [CAST(unix_timestamp(date, MM/dd/yy) AS TIMESTAMP)];;
'Project [year('date) AS year(date)#11]
推荐答案
case class Person(id: Int, date: String)
val mydf = Seq(Person(1,"9/16/13")).toDF
val solution = mydf.withColumn("year", year(to_timestamp($"date", "MM/dd/yy")))
scala> solution.show
+---+-------+----+
| id| date|year|
+---+-------+----+
| 1|9/16/13|2013|
+---+-------+----+
看起来 year
不是给你两位数字而是四位数字.我将字符串截断留给您作为家庭练习:)
It looks like year
does not give you two digits but four for years. I'm leaving the string truncation as a home exercise for you :)
实际:org.apache.spark.sql.AnalysisException:无法解析给定输入列的日期":[CAST(unix_timestamp(date, MM/dd/yy) AS TIMESTAMP)];;'项目 [年('日期)AS 年(日期)#11]
Actual: org.apache.spark.sql.AnalysisException: cannot resolve 'date' given input columns: [CAST(unix_timestamp(date, MM/dd/yy) AS TIMESTAMP)];; 'Project [year('date) AS year(date)#11]
异常的原因是您想要访问不再是旧"的date
列(在select(year($"date"))
中)select
(select(unix_timestamp(mydf("date"),"MM/dd/yy").cast(TimestampType))
.
The reason of the exception is that you want to access the "old" date
column (in select(year($"date"))
) that's no longer available after select
(select(unix_timestamp(mydf("date"),"MM/dd/yy").cast(TimestampType))
.
您可以再次使用 alias
或 as
将看起来怪异的自动生成的名称更改为其他类似 date
的名称,这样就可以了工作.
You could use alias
or as
to change the weird-looking auto-generated name into something else like date
again, and that would work.
这篇关于如何从日期字符串中提取年份?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!