Pandas 数据框:省略周末和假期附近的日子 [英] Pandas dataframe: omit weekends and days near holidays

查看:46
本文介绍了Pandas 数据框:省略周末和假期附近的日子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有 DataTimeIndex 和其他一些列的 Pandas 数据框,类似于:

I have a Pandas dataframe with a DataTimeIndex and some other columns, similar to this:

import pandas as pd
import numpy as np

range = pd.date_range('2017-12-01', '2018-01-05', freq='6H')
df = pd.DataFrame(index = range)

# Average speed in miles per hour
df['value'] = np.random.randint(low=0, high=60, size=len(df.index))

df.info()
# DatetimeIndex: 141 entries, 2017-12-01 00:00:00 to 2018-01-05 00:00:00
# Freq: 6H
# Data columns (total 1 columns):
# value    141 non-null int64
# dtypes: int64(1)
# memory usage: 2.2 KB

df.head(10)
#                      value
# 2017-12-01 00:00:00     15
# 2017-12-01 06:00:00     54
# 2017-12-01 12:00:00     19
# 2017-12-01 18:00:00     13
# 2017-12-02 00:00:00     35
# 2017-12-02 06:00:00     31
# 2017-12-02 12:00:00     58
# 2017-12-02 18:00:00      6
# 2017-12-03 00:00:00      8
# 2017-12-03 06:00:00     30

如何选择或过滤以下条目:

How can I select or filter the entries that are:

  1. 仅限工作日(即周六或周日不是周末)

  1. Weekdays only (that is, not weekend days Saturday or Sunday)

不在列表中日期的 N 天内(例如12-25"或01-01"等美国假期)?

Not within N days of the dates in a list (e.g. U.S. holidays like '12-25' or '01-01')?

我希望是这样的:

df = exclude_Sat_and_Sun(df)

omit_days = ['12-25', '01-01']
N = 3 # days near the holidays
df = exclude_days_near_omit_days(N, omit_days)

我想创建一个新列来划分月份和日期,然后将它们与上述 1 和 2 的标准进行比较.但是,我希望使用 DateTimeIndex 能有更多 Pythonic 的东西.

I was thinking of creating a new column to break out the month and day and then comparing them to the criteria for 1 and 2 above. However, I was hoping for something more Pythonic using the DateTimeIndex.

感谢您的帮助.

推荐答案

使用 Pandas DatetimeIndex.dayofweek 属性可以轻松完成第一部分,该属性从星期一开始计算工作日为 0 并以周日为 6.

The first part can be easily accomplished using the Pandas DatetimeIndex.dayofweek property, which starts counting weekdays with Monday as 0 and ending with Sunday as 6.

df[df.index.dayofweek <5] 只会给你工作日.


对于第二部分,您可以使用 datetime 模块.下面我只举一个日期的例子,即2017-12-25.您可以轻松地将其概括为日期列表,例如通过定义辅助函数.


For the second part you can use the datetime module. Below I will give an example for only one date, namely 2017-12-25. You can easily generalize it to a list of dates, for example by defining a helper function.

from datetime import datetime, timedelta

N = 3

df[abs(df.index.date - datetime.strptime("2017-12-25", '%Y-%m-%d').date()) > timedelta(N)]

这将给出距离 2017-12-25 超过 N=3 天的所有日期.即排除2017-12-22到2017-12-28这7天的间隔.

This will give all dates that are more than N=3 days away from 2017-12-25. That is, it will exclude an interval of 7 days from 2017-12-22 to 2017-12-28.


最后,您可能知道,您可以使用 & 运算符组合这两个条件.


Lastly, you can combine the two criteria using the & operator, as you probably know.

df[
   (df.index.dayofweek < 5) 
   & 
   (abs(df.index.date - datetime.strptime("2017-12-25", '%Y-%m-%d').date()) > timedelta(N))
  ]

这篇关于Pandas 数据框:省略周末和假期附近的日子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆