pyspark 检查 HH:mm:ss 是否在一个范围内 [英] pyspark check if HH:mm:ss is in a range
问题描述
我有一些数据看起来像这样.
I have some data looks like this.
time
08:28:24
22:20:54
12:59:38
21:46:07
我想选择 16:00:00 到 23:59:59 之间的时间,这是一个封闭的范围.
I want to select the time that stand between 16:00:00 and 23:59:59, this is a closed range.
我该怎么办?('时间'列类型是字符串.)
What should i do with it? ('Time' column type is string.)
谢谢!
推荐答案
您的条件可以简化为检查 time
列的小时部分是否在 16
和 23
之间.
Your condition can be simplified to checking if the hour part of your time
column is between 16
and 23
.
您可以使用 pyspark.sql.functions.split
标记 :
字符上的 time
列.提取索引 0 处的令牌以获取小时,并使用 pyspark.sql.Column.between()
(包括边界).
You can get the hour by using pyspark.sql.functions.split
to tokenize the time
column on the :
character. Extract the token at index 0 to get the hour, and make the comparison using pyspark.sql.Column.between()
(which is inclusive of the bounds).
from pyspark.sql.functions import split
df.where(split("time", ":")[0].between(16, 23)).show()
#+--------+
#| time|
#+--------+
#|22:20:54|
#|21:46:07|
#+--------+
请注意,即使 split
返回一个字符串,也会隐式转换为 int
以进行 between
比较.
Note that even though split
returns a string, there is an implicit conversion to int
to do the between
comparison.
当然,如果您有更复杂的过滤条件,包括查看分钟或秒,则可以扩展此功能:
Of course, this could be extended if you had more complicated filtering criteria that also involved looking at minutes or seconds:
df.select(
"*",
split("time", ":")[0].cast("int").alias("hour"),
split("time", ":")[1].cast("int").alias("minute"),
split("time", ":")[2].cast("int").alias("second")
).show()
#+--------+----+------+------+
#| time|hour|minute|second|
#+--------+----+------+------+
#|08:28:24| 8| 28| 24|
#|22:20:54| 22| 20| 54|
#|12:59:38| 12| 59| 38|
#|21:46:07| 21| 46| 7|
#+--------+----+------+------+
这篇关于pyspark 检查 HH:mm:ss 是否在一个范围内的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!