取 pandas 中两个时间戳范围条件的交集 [英] Taking the intersection of two timestamp range conditions in pandas

查看:142
本文介绍了取 pandas 中两个时间戳范围条件的交集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

import pandas as pd

data = {'date': ['1998-03-01', '2001-04-01','1998-06-01','2001-08-01','2001-05-03'], 
    'node1': [1, 1, 2,2,3],
 'node2': [8,316,26,35,44],
 'weight': [1,1,1,1,1], }
df = pd.DataFrame(data, columns = ['date', 'node1','node2','weight'])
print(df)

mask1 = (df['date'] > '1998-01-01 00:00:01') & (df['date'] <= '2000-01-01 
00:00:01')
mask2 = (df['date'] > '2000-01-01 00:00:01') & (df['date'] <= '2003-01-01 
00:00:01')

mask = pd.concat((mask1, mask2), axis=1)
slct = mask.all(axis=1)
print df.ix[slct]

上面是我的尝试.数据集(上面是一个玩具数据集)有4列,分别是node1,node2,weight,timestamp.我想创建两组条件为:set1的行的时间戳记在98-00年之间,将set2的行设置在00-02年之间.

The above is my attempt.The dataset(the above is a toy dataset) has 4 columns that is node1,node2,weight,timestamp. I want to create two sets of rows with the condition :set1 should have rows whose timestamp is between the years 98-00 and set 2 the years 00-02.

这两个集合都应同时在两个年份范围(98-00和00-02)之间进行排.

Also both the sets should rows which are in both the year ranges(98-00 and 00-02).

因此,在上面的示例中,这两个集合应该分别为{1,2}和{1,2} .3,因为它们仅出现在00-02范围内. 但是我的答案越来越空白. 首先,我进行了mask1和mask2来获得满足各个范围的行,然后将它们合并以找到这两个条件的交集.

So in the above example the two sets should be {1,2} and {1,2}.3 should be excluded as it appears in the 00-02 range only. But I am getting empty frame in the answer. Firstly I did mask1 and mask2 to get rows satisfying the individual ranges and then I concat them to find intersection of the two conditions.

推荐答案

您可以将groupby与isin一起使用,以了解包含1998-2000年和2000-2002年的日期,即使用基于node1的groupby的掩码

You can use groupby with isin to know date contain both 1998-2000 and 2000-2002 i.e use mask based on groupby of node1 like this

df['date'] = pd.to_datetime(df['date'])
mask = df.groupby('node1').apply(lambda x : (x['date'].dt.year.isin([1998,1999,2000])).any())
mask2 = df.groupby('node1').apply(lambda x : (x['date'].dt.year.isin([2000,2001,2002])).any())

df[df['node1'].isin(mask[mask & mask2].index)] # Get the dataframe 

说明:

maskmask2将给出类似


mask                  mask2
(node1                  node1
 1     True           1    True
 2     True           2    True
 3    False           3    True
 dtype: bool,        dtype: bool)

稍后,我们可以使用&获取基于真值表的掩码,这是一个仅包含真值的新掩码,即

Later we can use & to get the mask based on truth table, a new mask with only true values i.e

mask[mask & mask2] 


node1
1    True
2    True
dtype: bool

根据新蒙版(即

df['node1'].isin(mask[mask & mask2].index)


0     True
1     True
2     True
3     True
4    False
Name: node1, dtype: bool

输出:

df[df['node1'].isin(mask[mask & mask2].index)]


        date  node1  node2  weight
0 1998-03-01      1      8       1
1 2001-04-01      1    316       1
2 1998-06-01      2     26       1
3 2001-08-01      2     35       1

这篇关于取 pandas 中两个时间戳范围条件的交集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆