pandas 过滤-非索引列上的between_time [英] Pandas filtering - between_time on a non-index column
问题描述
我需要过滤特定时间的数据. DataFrame函数between_time似乎是执行此操作的正确方法,但是,它仅适用于数据帧的索引列;但我需要使用原始格式的数据(例如,数据透视表将期望datetime列具有正确的名称,而不是作为索引).
I need to filter out data with specific hours. The DataFrame function between_time seems to be the proper way to do that, however, it only works on the index column of the dataframe; but I need to have the data in the original format (e.g. pivot tables will expect the datetime column to be with the proper name, not as the index).
这意味着每个过滤器如下所示:
This means that each filter looks something like this:
df.set_index(keys='my_datetime_field').between_time('8:00','21:00').reset_index()
这意味着每次运行此类过滤器时,都会进行两次重新索引操作.
Which implies that there are two reindexing operations every time such a filter is run.
这是一种好习惯还是有一种更合适的方法来做同样的事情?
Is this a good practice or is there a more appropriate way to do the same thing?
推荐答案
创建一个DatetimeIndex
,但将其存储在变量中,而不是DataFrame中.
然后调用它的indexer_between_time
方法.这将返回一个整数数组,然后该整数数组可用于使用iloc
从中选择行:
Create a DatetimeIndex
, but store it in a variable, not the DataFrame.
Then call it's indexer_between_time
method. This returns an integer array which can then be used to select rows from df
using iloc
:
import pandas as pd
import numpy as np
N = 100
df = pd.DataFrame(
{'date': pd.date_range('2000-1-1', periods=N, freq='H'),
'value': np.random.random(N)})
index = pd.DatetimeIndex(df['date'])
df.iloc[index.indexer_between_time('8:00','21:00')]
这篇关于 pandas 过滤-非索引列上的between_time的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!