具有多个日期范围的Pandas.DataFrame切片 [英] Pandas.DataFrame slicing with multiple date ranges

查看:162
本文介绍了具有多个日期范围的Pandas.DataFrame切片的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有100,000+行的以日期时间索引的数据框对象.我想知道是否有使用熊猫的便捷方法来获取此数据框在多个日期范围内的子集.

I have a datetime-indexed dataframe object with 100,000+ rows. I was wondering if there was a convenient way using pandas to get a subset of this dataframe that is within multiple date ranges.

例如,让我们说我们有两个日期范围: (datetime.datetime(2016,6,27,0,0,0), datetime.datetime(2016,6,27,5,0,0)

For example, let us say that we have two date ranges: (datetime.datetime(2016,6,27,0,0,0), datetime.datetime(2016,6,27,5,0,0)

(datetime.datetime(2016,6,27,15,0,0), datetime.datetime(2016,6,27,23,59,59)

让我们说我们要获取第一个日期范围或第二个日期范围内的数据框对象的所有行,其中该数据框对象具有自2016-06-27 00:00:00起每秒的行到2016-06-27 23:59:59.熊猫有没有简单的方法可以做到这一点?

Let us say we want to get all rows of a dataframe object that is in either the first date range or the second date range, where the dataframe object has rows for every second from 2016-06-27 00:00:00 to 2016-06-27 23:59:59. Is there an easy way in pandas to do this?

谢谢您的帮助!

推荐答案

两种主要方法可以对DataFrame进行切片带有按日期的DatetimeIndex.

There are two main ways to slice a DataFrame with a DatetimeIndex by date.

  • 按切片:df.loc[start:end].如果有多个日期范围,则单个 切片可以与pd.concat串联.

  • by slices: df.loc[start:end]. If there are multiple date ranges, the single slices may be concatenated with pd.concat.

通过布尔选择掩码:df.loc[mask]

使用pd.concat和slices :

import numpy as np
import pandas as pd
np.random.seed(2016)

N = 10**2
df = pd.DataFrame(np.random.randint(10, size=(N, 2)), 
                  index=pd.date_range('2016-6-27', periods=N, freq='45T'))

result = pd.concat([df.loc['2016-6-27':'2016-6-27 5:00'],
                    df.loc['2016-6-27 15:00':'2016-6-27 23:59:59']])

收益

                     0  1
2016-06-27 00:00:00  0  2
2016-06-27 00:45:00  5  5
2016-06-27 01:30:00  9  6
2016-06-27 02:15:00  8  4
2016-06-27 03:00:00  5  0
2016-06-27 03:45:00  4  8
2016-06-27 04:30:00  7  0
2016-06-27 15:00:00  2  5
2016-06-27 15:45:00  6  7
2016-06-27 16:30:00  6  8
2016-06-27 17:15:00  5  1
2016-06-27 18:00:00  2  9
2016-06-27 18:45:00  9  1
2016-06-27 19:30:00  9  7
2016-06-27 20:15:00  3  6
2016-06-27 21:00:00  3  5
2016-06-27 21:45:00  0  8
2016-06-27 22:30:00  5  6
2016-06-27 23:15:00  0  8


请注意,与Python中使用的大多数切片语法不同,


Note that unlike most slicing syntaxes used in Python,

df.loc['2016-6-27':'2016-6-27 5:00']

在两端都包含在内-切片定义一个封闭的间隔,不是 半开间隔.

is inclusive on both ends -- the slice defines a closed interval, is not a half-open interval.

使用布尔选择掩码:

mask = (((df.index >= '2016-6-27') & (df.index <= '2016-6-27 5:00')) 
        | ((df.index >= '2016-6-27 15:00') & (df.index < '2016-6-28')))
result2 = df.loc[mask]
assert result.equals(result2)

这篇关于具有多个日期范围的Pandas.DataFrame切片的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆