如何根据时间戳删除行(不在列表中) [英] How to drop rows based on timestamp where hours are not in list
问题描述
我有一个大数据框(几百万行),其中我的一列是格式为"hh:mm:ss"的时间戳(标记为"Timestamp")."07:00:04".我想删除小时不在7到21之间的行.
I have a large dataframe (several million rows) where one of my columns is a timestamp (labeled 'Timestamp') in the format "hh:mm:ss" e.g. "07:00:04". I want to drop the rows where the hour is NOT between or equal to 7 and 21.
我已经尝试将时间戳转换为字符串并使用切片,但是我无法使其工作,我相信应该有一种更有效的方法.
I've have tried to convert the timestamps to strings and use slicing but I was not able to get it working and I believe there should be a more effective way.
# Create list of opening hours (these should not be droped)
opening_hour = 7
closeing_hour = 21
trading_hours = []
for hour in range(closeing_hour - opening_hour + 1):
add_hour = opening_hour + hour
trading_hours.append(add_hour)
我的数据框看起来像这样:
My dataframe looks something like this:
Date Timestamp Close
0 20180102 07:05:00 12925.979
1 20180102 21:05:02 12925.479
2 20180102 22:05:04 12925.280
3 20180102 23:55:06 12925.479
4 20180102 06:05:07 12925.780
5 20180103 07:05:07 12925.780
[...]
我想删除索引为2、3和4(有几千个)的行,因此结果应为:
I want to drop the rows with index 2, 3 and 4 (there are several thousand), so the result should be something like:
Date Timestamp Close
0 20180102 07:05:00 12925.979
1 20180102 21:05:02 12925.479
2 20180103 07:05:07 12925.780
[...]
推荐答案
首先,您可以为 DataFrame
提供适当的 DatetimeIndex
,如下所示:
First you can give your DataFrame
a proper DatetimeIndex
as follows:
dtidx = pd.DatetimeIndex(df['Date'].astype(str) + ' ' + df['Timestamp'].astype(str))
df.index = dtidx
,然后使用 between_time
来获取介于07和21之间的小时数:
and then use between_time
to get the hours between hours 07 and 21 inclusive:
df.between_time('07:00', '22:00')
# returns
Date Timestamp Close
2018-01-02 07:05:00 20180102 07:05:00 12926
2018-01-02 21:05:02 20180102 21:05:02 12925.5
2018-01-03 07:05:07 20180103 07:05:07 12925.8
这篇关于如何根据时间戳删除行(不在列表中)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!