Pandas 数据框:省略周末和假期附近的日子 [英] Pandas dataframe: omit weekends and days near holidays
问题描述
我有一个带有 DataTimeIndex 和其他一些列的 Pandas 数据框,类似于:
I have a Pandas dataframe with a DataTimeIndex and some other columns, similar to this:
import pandas as pd
import numpy as np
range = pd.date_range('2017-12-01', '2018-01-05', freq='6H')
df = pd.DataFrame(index = range)
# Average speed in miles per hour
df['value'] = np.random.randint(low=0, high=60, size=len(df.index))
df.info()
# DatetimeIndex: 141 entries, 2017-12-01 00:00:00 to 2018-01-05 00:00:00
# Freq: 6H
# Data columns (total 1 columns):
# value 141 non-null int64
# dtypes: int64(1)
# memory usage: 2.2 KB
df.head(10)
# value
# 2017-12-01 00:00:00 15
# 2017-12-01 06:00:00 54
# 2017-12-01 12:00:00 19
# 2017-12-01 18:00:00 13
# 2017-12-02 00:00:00 35
# 2017-12-02 06:00:00 31
# 2017-12-02 12:00:00 58
# 2017-12-02 18:00:00 6
# 2017-12-03 00:00:00 8
# 2017-12-03 06:00:00 30
如何选择或过滤以下条目:
How can I select or filter the entries that are:
仅限工作日(即周六或周日不是周末)
Weekdays only (that is, not weekend days Saturday or Sunday)
不在列表中日期的 N 天内(例如12-25"或01-01"等美国假期)?
Not within N days of the dates in a list (e.g. U.S. holidays like '12-25' or '01-01')?
我希望是这样的:
df = exclude_Sat_and_Sun(df)
omit_days = ['12-25', '01-01']
N = 3 # days near the holidays
df = exclude_days_near_omit_days(N, omit_days)
我想创建一个新列来划分月份和日期,然后将它们与上述 1 和 2 的标准进行比较.但是,我希望使用 DateTimeIndex 能有更多 Pythonic 的东西.
I was thinking of creating a new column to break out the month and day and then comparing them to the criteria for 1 and 2 above. However, I was hoping for something more Pythonic using the DateTimeIndex.
感谢您的帮助.
推荐答案
使用 Pandas DatetimeIndex.dayofweek
属性可以轻松完成第一部分,该属性从星期一开始计算工作日为 0 并以周日为 6.
The first part can be easily accomplished using the Pandas DatetimeIndex.dayofweek
property, which starts counting weekdays with Monday as 0 and ending with Sunday as 6.
df[df.index.dayofweek <5]
只会给你工作日.
对于第二部分,您可以使用 datetime
模块.下面我只举一个日期的例子,即2017-12-25.您可以轻松地将其概括为日期列表,例如通过定义辅助函数.
For the second part you can use the datetime
module. Below I will give an example for only one date, namely 2017-12-25. You can easily generalize it to a list of dates, for example by defining a helper function.
from datetime import datetime, timedelta
N = 3
df[abs(df.index.date - datetime.strptime("2017-12-25", '%Y-%m-%d').date()) > timedelta(N)]
这将给出距离 2017-12-25 超过 N=3
天的所有日期.即排除2017-12-22到2017-12-28这7天的间隔.
This will give all dates that are more than N=3
days away from 2017-12-25. That is, it will exclude an interval of 7 days from 2017-12-22 to 2017-12-28.
最后,您可能知道,您可以使用 &
运算符组合这两个条件.
Lastly, you can combine the two criteria using the &
operator, as you probably know.
df[
(df.index.dayofweek < 5)
&
(abs(df.index.date - datetime.strptime("2017-12-25", '%Y-%m-%d').date()) > timedelta(N))
]
这篇关于Pandas 数据框:省略周末和假期附近的日子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!