从 pandas 数据框中消除特定日期的最快方法 [英] Fastest way to eliminate specific dates from pandas dataframe

查看:59
本文介绍了从 pandas 数据框中消除特定日期的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理大型数据框,但我一直在努力寻找消除特定日期的有效方法。请注意,我正在尝试从特定日期删除任何测量

I'm working with a large data frame and I'm struggling to find an efficient way to eliminate specific dates. Note that I'm trying to eliminate any measurements from a specific date.

熊猫具有出色的功能,您可以在其中调用:

Pandas has this great function, where you can call:

df.ix['2016-04-22'] 

并从当天开始拉出所有行。但是,如果我想删除 2016-04-22中的所有行怎么办?

and pull all rows from that day. But what if I want to eliminate all rows from '2016-04-22'?

我想要这样的功能:

df.ix[~'2016-04-22']

(但这不起作用)

此外,如果我想消除日期列表怎么办?

Also, what if I want to eliminate a list of dates?

现在,我有以下解决方案:

Right now, I have the following solution:

import numpy as np
import pandas as pd
from numpy import random

###Create a sample data frame

dates = [pd.Timestamp('2016-04-25 06:48:33'), pd.Timestamp('2016-04-27 15:33:23'), pd.Timestamp('2016-04-23 11:23:41'), pd.Timestamp('2016-04-28    12:08:20'), pd.Timestamp('2016-04-21 15:03:49'), pd.Timestamp('2016-04-23 08:13:42'), pd.Timestamp('2016-04-27 21:18:22'), pd.Timestamp('2016-04-27 18:08:23'), pd.Timestamp('2016-04-27 20:48:22'), pd.Timestamp('2016-04-23 14:08:41'), pd.Timestamp('2016-04-27 02:53:26'), pd.Timestamp('2016-04-25 21:48:31'), pd.Timestamp('2016-04-22 12:13:47'), pd.Timestamp('2016-04-27 01:58:26'), pd.Timestamp('2016-04-24 11:48:37'), pd.Timestamp('2016-04-22 08:38:46'), pd.Timestamp('2016-04-26 13:58:28'), pd.Timestamp('2016-04-24 15:23:36'), pd.Timestamp('2016-04-22 07:53:46'), pd.Timestamp('2016-04-27 23:13:22')]

values = random.normal(20, 20, 20)

df = pd.DataFrame(index=dates, data=values, columns ['values']).sort_index()

### This is the list of dates I want to remove

removelist = ['2016-04-22', '2016-04-24']

此for循环基本上会获取我要删除的日期的索引,然后从主数据框的索引中删除它,然后积极选择剩余的日期(即:数据表中的日期)。

This for loop basically grabs the index for the dates I want to remove, then eliminates it from the index of the main dataframe, then positively selects the remaining dates (ie: the good dates) from the dataframe.

for r in removelist:
    elimlist = df.ix[r].index.tolist()
    ind = df.index.tolist()
    culind = [i for i in ind if i not in elimlist]
    df = df.ix[culind]

还有什么更好的地方吗?

Is there anything better out there?

我也尝试过按四舍五入日期+1天建立索引,所以像这样:

I've also tried indexing by the rounded date+1 day, so something like this:

df[~((df['Timestamp'] < r+pd.Timedelta("1 day")) & (df['Timestamp'] > r))]

但这确实很麻烦,而且(最终)我仍将使用for循环当我需要消除n个特定日期时。

But this gets really cumbersome and (at the end of the day) I'll still be using a for loop when I need to eliminate n specific dates.

有更好的方法!对?也许?

There's got to be a better way! Right? Maybe?

推荐答案

与@Alexander相同,但使用 DatetimeIndex numpy.in1d

Same idea as @Alexander, but using properties of the DatetimeIndex and numpy.in1d:

mask = ~np.in1d(df.index.date, pd.to_datetime(removelist).date)
df = df.loc[mask, :]

时间:

%timeit df.loc[~np.in1d(df.index.date, pd.to_datetime(removelist).date), :]
1000 loops, best of 3: 1.42 ms per loop

%timeit df[[d.date() not in pd.to_datetime(removelist) for d in df.index]]
100 loops, best of 3: 3.25 ms per loop

这篇关于从 pandas 数据框中消除特定日期的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆