pandas 过滤-非索引列上的between_time [英] Pandas filtering - between_time on a non-index column

查看:57
本文介绍了 pandas 过滤-非索引列上的between_time的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要过滤特定时间的数据. DataFrame函数between_time似乎是执行此操作的正确方法,但是,它仅适用于数据帧的索引列;但我需要使用原始格式的数据(例如,数据透视表将期望datetime列具有正确的名称,而不是作为索引).

I need to filter out data with specific hours. The DataFrame function between_time seems to be the proper way to do that, however, it only works on the index column of the dataframe; but I need to have the data in the original format (e.g. pivot tables will expect the datetime column to be with the proper name, not as the index).

这意味着每个过滤器如下所示:

This means that each filter looks something like this:

df.set_index(keys='my_datetime_field').between_time('8:00','21:00').reset_index()

这意味着每次运行此类过滤器时,都会进行两次重新索引操作.

Which implies that there are two reindexing operations every time such a filter is run.

这是一种好习惯还是有一种更合适的方法来做同样的事情?

Is this a good practice or is there a more appropriate way to do the same thing?

推荐答案

创建一个DatetimeIndex,但将其存储在变量中,而不是DataFrame中. 然后调用它的indexer_between_time方法.这将返回一个整数数组,然后该整数数组可用于使用iloc从中选择行:

Create a DatetimeIndex, but store it in a variable, not the DataFrame. Then call it's indexer_between_time method. This returns an integer array which can then be used to select rows from df using iloc:

import pandas as pd
import numpy as np

N = 100
df = pd.DataFrame(
    {'date': pd.date_range('2000-1-1', periods=N, freq='H'),
     'value': np.random.random(N)})

index = pd.DatetimeIndex(df['date'])
df.iloc[index.indexer_between_time('8:00','21:00')]

这篇关于 pandas 过滤-非索引列上的between_time的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆