在Spark中获取上周一 [英] Get Last Monday in Spark
本文介绍了在Spark中获取上周一的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在将Spark 2.0与Python API配合使用.
I am using Spark 2.0 with the Python API.
我有一个数据类型为DateType()的列.我想在数据框中添加一列,其中包含最近的星期一.
I have a dataframe with a column of type DateType(). I would like to add a column to the dataframe containing the most recent Monday.
我可以这样:
reg_schema = pyspark.sql.types.StructType([
pyspark.sql.types.StructField('AccountCreationDate', pyspark.sql.types.DateType(), True),
pyspark.sql.types.StructField('UserId', pyspark.sql.types.LongType(), True)
])
reg = spark.read.schema(reg_schema).option('header', True).csv(path_to_file)
reg = reg.withColumn('monday',
pyspark.sql.functions.when(pyspark.sql.functions.date_format(reg.AccountCreationDate,'E') == 'Mon',
reg.AccountCreationDate).otherwise(
pyspark.sql.functions.when(pyspark.sql.functions.date_format(reg.AccountCreationDate,'E') == 'Tue',
pyspark.sql.functions.date_sub(reg.AccountCreationDate, 1)).otherwise(
pyspark.sql.functions.when(pyspark.sql.functions.date_format(reg.AccountCreationDate, 'E') == 'Wed',
pyspark.sql.functions.date_sub(reg.AccountCreationDate, 2)).otherwise(
pyspark.sql.functions.when(pyspark.sql.functions.date_format(reg.AccountCreationDate, 'E') == 'Thu',
pyspark.sql.functions.date_sub(reg.AccountCreationDate, 3)).otherwise(
pyspark.sql.functions.when(pyspark.sql.functions.date_format(reg.AccountCreationDate, 'E') == 'Fri',
pyspark.sql.functions.date_sub(reg.AccountCreationDate, 4)).otherwise(
pyspark.sql.functions.when(pyspark.sql.functions.date_format(reg.AccountCreationDate, 'E') == 'Sat',
pyspark.sql.functions.date_sub(reg.AccountCreationDate, 5)).otherwise(
pyspark.sql.functions.when(pyspark.sql.functions.date_format(reg.AccountCreationDate, 'E') == 'Sun',
pyspark.sql.functions.date_sub(reg.AccountCreationDate, 6))
)))))))
但是,对于一些应该相当简单的东西来说,这似乎是很多代码.有更简洁的方法吗?
However, this seems like a lot of code for something that should be rather simple. Is there a more concise way of doing this?
推荐答案
您可以使用next_day
确定下一个日期并减去一周.所需的功能可以按如下方式导入:
You can determine next date using next_day
and subtract a week. Required functions can be imported as follows:
from pyspark.sql.functions import next_day, date_sub
还有:
def previous_day(date, dayOfWeek):
return date_sub(next_day(date, "monday"), 7)
最后一个例子:
from pyspark.sql.functions import to_date
df = sc.parallelize([
("2016-10-26", )
]).toDF(["date"]).withColumn("date", to_date("date"))
df.withColumn("last_monday", previous_day("date", "monday"))
结果:
+----------+-----------+
| date|last_monday|
+----------+-----------+
|2016-10-26| 2016-10-24|
+----------+-----------+
这篇关于在Spark中获取上周一的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文