weekofyear()返回1月1日看似不正确的结果 [英] weekofyear() returning seemingly incorrect results for January 1

查看:575
本文介绍了weekofyear()返回1月1日看似不正确的结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不太确定为什么我的代码给出52作为weekofyear("01/JAN/2017")的答案.

I'm not quite sure why my code gives 52 as the answer for: weekofyear("01/JAN/2017") .

有人对此有可能的解释吗?有更好的方法吗?

Does anyone have a possible explanation for this? Is there a better way to do this?

from pyspark.sql import SparkSession, functions
spark = SparkSession.builder.appName('weekOfYear').getOrCreate()
from pyspark.sql.functions import to_date

df = spark.createDataFrame(
    [(1, "01/JAN/2017"), (2, "15/FEB/2017")], ("id", "date")) 

df.show()
+---+-----------+
| id|       date|
+---+-----------+
|  1|01/JAN/2017|
|  2|15/FEB/2017|
+---+-----------+

计算一年中的星期

df=df.withColumn("weekofyear", functions.weekofyear(to_date(df["date"],"dd/MMM/yyyy")))

df.printSchema()

root
 |-- id: long (nullable = true)
 |-- date: string (nullable = true)
 |-- weekofyear: integer (nullable = true)

df.show()

错误"在下面可见:

+---+-----------+----------+
| id|       date|weekofyear|
+---+-----------+----------+
|  1|01/JAN/2017|        52|
|  2|15/FEB/2017|         7|
+---+-----------+----------+

推荐答案

如果星期几是星期一至星期四,weekofyear()似乎只会在1月1日返回1.

It seems like weekofyear() will only return 1 for January 1st if the day of the week is Monday through Thursday.

为确认这一点,我创建了一个数据框架,其中包含从1900年到2018年的所有"01/JAN/YYYY":

To confirm, I created a DataFrame with all "01/JAN/YYYY" from 1900 to 2018:

df = sqlCtx.createDataFrame(
    [(1, "01/JAN/{y}".format(y=year),) for year in range(1900,2019)],
    ["id", "date"]
)

现在让我们将其转换为日期,获取星期几,并计算weekofyear()的值:

Now let's convert it to a date, get the day of the week, and count the values for weekofyear():

import pyspark.sql.functions as f
df.withColumn("d", f.to_date(f.from_unixtime(f.unix_timestamp('date', "dd/MMM/yyyy"))))\
    .withColumn("weekofyear", f.weekofyear("d"))\
    .withColumn("dayofweek", f.date_format("d", "E"))\
    .groupBy("dayofweek", "weekofyear")\
    .count()\
    .show()
#+---------+----------+-----+
#|dayofweek|weekofyear|count|
#+---------+----------+-----+
#|      Sun|        52|   17|
#|      Mon|         1|   18|
#|      Tue|         1|   17|
#|      Wed|         1|   17|
#|      Thu|         1|   17|
#|      Fri|        53|   17|
#|      Sat|        53|    4|
#|      Sat|        52|   12|
#+---------+----------+-----+

注意,我正在使用Spark v 2.1,其中中描述的方法这个答案将字符串转换为日期.

Note, I am using Spark v 2.1 where to_date() does not accept a format argument, so I had to use the method described in this answer to convert the string to a date.

类似地,to_date()仅针对以下情况返回1:

Similarly to_date() only returns 1 for:

  • 1月2日(如果星期几是星期一至星期五).
  • 1月3日(如果星期几是星期一至星期六).

更新

此行为与 ISO 8601 定义一致.

这篇关于weekofyear()返回1月1日看似不正确的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆