weekofyear() 返回 1 月 1 日看似不正确的结果 [英] weekofyear() returning seemingly incorrect results for January 1
问题描述
我不太确定为什么我的代码给出 52 作为答案: weekofyear("01/JAN/2017")
.
I'm not quite sure why my code gives 52 as the answer for: weekofyear("01/JAN/2017")
.
有没有人对此有可能的解释?有没有更好的方法来做到这一点?
Does anyone have a possible explanation for this? Is there a better way to do this?
from pyspark.sql import SparkSession, functions
spark = SparkSession.builder.appName('weekOfYear').getOrCreate()
from pyspark.sql.functions import to_date
df = spark.createDataFrame(
[(1, "01/JAN/2017"), (2, "15/FEB/2017")], ("id", "date"))
df.show()
+---+-----------+
| id| date|
+---+-----------+
| 1|01/JAN/2017|
| 2|15/FEB/2017|
+---+-----------+
计算一年中的第几周
df=df.withColumn("weekofyear", functions.weekofyear(to_date(df["date"],"dd/MMM/yyyy")))
df.printSchema()
root
|-- id: long (nullable = true)
|-- date: string (nullable = true)
|-- weekofyear: integer (nullable = true)
df.show()
错误"如下所示:
+---+-----------+----------+
| id| date|weekofyear|
+---+-----------+----------+
| 1|01/JAN/2017| 52|
| 2|15/FEB/2017| 7|
+---+-----------+----------+
推荐答案
似乎 weekofyear()
如果一周中的某一天是星期一到星期四,则它只会在 1 月 1 日返回 1.
It seems like weekofyear()
will only return 1 for January 1st if the day of the week is Monday through Thursday.
为了确认,我创建了一个 DataFrame,其中包含从 1900 年到 2018 年的所有 "01/JAN/YYYY"
:
To confirm, I created a DataFrame with all "01/JAN/YYYY"
from 1900 to 2018:
df = sqlCtx.createDataFrame(
[(1, "01/JAN/{y}".format(y=year),) for year in range(1900,2019)],
["id", "date"]
)
现在让我们将其转换为日期,获取星期几,并计算 weekofyear()
的值:
Now let's convert it to a date, get the day of the week, and count the values for weekofyear()
:
import pyspark.sql.functions as f
df.withColumn("d", f.to_date(f.from_unixtime(f.unix_timestamp('date', "dd/MMM/yyyy"))))\
.withColumn("weekofyear", f.weekofyear("d"))\
.withColumn("dayofweek", f.date_format("d", "E"))\
.groupBy("dayofweek", "weekofyear")\
.count()\
.show()
#+---------+----------+-----+
#|dayofweek|weekofyear|count|
#+---------+----------+-----+
#| Sun| 52| 17|
#| Mon| 1| 18|
#| Tue| 1| 17|
#| Wed| 1| 17|
#| Thu| 1| 17|
#| Fri| 53| 17|
#| Sat| 53| 4|
#| Sat| 52| 12|
#+---------+----------+-----+
注意,我使用的是 Spark v 2.1,其中 to_date()
不接受格式参数,所以我不得不使用 this answer 将字符串转换为日期.
Note, I am using Spark v 2.1 where to_date()
does not accept a format argument, so I had to use the method described in this answer to convert the string to a date.
类似的 to_date()
只返回 1 用于:
Similarly to_date()
only returns 1 for:
- 1 月 2 日,如果一周中的某一天是周一至周五.
- 1 月 3 日,如果一周中的某一天是周一至周六.
更新
此行为与 ISO 8601 定义一致.
This behavior is consistent with the ISO 8601 definition.
这篇关于weekofyear() 返回 1 月 1 日看似不正确的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!