取 pandas 中两个时间戳范围条件的交集 [英] Taking the intersection of two timestamp range conditions in pandas
问题描述
import pandas as pd
data = {'date': ['1998-03-01', '2001-04-01','1998-06-01','2001-08-01','2001-05-03'],
'node1': [1, 1, 2,2,3],
'node2': [8,316,26,35,44],
'weight': [1,1,1,1,1], }
df = pd.DataFrame(data, columns = ['date', 'node1','node2','weight'])
print(df)
mask1 = (df['date'] > '1998-01-01 00:00:01') & (df['date'] <= '2000-01-01
00:00:01')
mask2 = (df['date'] > '2000-01-01 00:00:01') & (df['date'] <= '2003-01-01
00:00:01')
mask = pd.concat((mask1, mask2), axis=1)
slct = mask.all(axis=1)
print df.ix[slct]
上面是我的尝试.数据集(上面是一个玩具数据集)有4列,分别是node1,node2,weight,timestamp.我想创建两组条件为:set1的行的时间戳记在98-00年之间,将set2的行设置在00-02年之间.
The above is my attempt.The dataset(the above is a toy dataset) has 4 columns that is node1,node2,weight,timestamp. I want to create two sets of rows with the condition :set1 should have rows whose timestamp is between the years 98-00 and set 2 the years 00-02.
这两个集合都应同时在两个年份范围(98-00和00-02)之间进行排.
Also both the sets should rows which are in both the year ranges(98-00 and 00-02).
因此,在上面的示例中,这两个集合应该分别为{1,2}和{1,2} .3,因为它们仅出现在00-02范围内. 但是我的答案越来越空白. 首先,我进行了mask1和mask2来获得满足各个范围的行,然后将它们合并以找到这两个条件的交集.
So in the above example the two sets should be {1,2} and {1,2}.3 should be excluded as it appears in the 00-02 range only. But I am getting empty frame in the answer. Firstly I did mask1 and mask2 to get rows satisfying the individual ranges and then I concat them to find intersection of the two conditions.
推荐答案
您可以将groupby与isin一起使用,以了解包含1998-2000年和2000-2002年的日期,即使用基于node1的groupby的掩码
You can use groupby with isin to know date contain both 1998-2000 and 2000-2002 i.e use mask based on groupby of node1 like this
df['date'] = pd.to_datetime(df['date'])
mask = df.groupby('node1').apply(lambda x : (x['date'].dt.year.isin([1998,1999,2000])).any())
mask2 = df.groupby('node1').apply(lambda x : (x['date'].dt.year.isin([2000,2001,2002])).any())
df[df['node1'].isin(mask[mask & mask2].index)] # Get the dataframe
说明:
mask
和mask2
将给出类似
mask mask2
(node1 node1
1 True 1 True
2 True 2 True
3 False 3 True
dtype: bool, dtype: bool)
稍后,我们可以使用&
获取基于真值表的掩码,这是一个仅包含真值的新掩码,即
Later we can use &
to get the mask based on truth table, a new mask with only true values i.e
mask[mask & mask2]
node1
1 True
2 True
dtype: bool
根据新蒙版(即
df['node1'].isin(mask[mask & mask2].index)
0 True
1 True
2 True
3 True
4 False
Name: node1, dtype: bool
输出:
df[df['node1'].isin(mask[mask & mask2].index)]
date node1 node2 weight
0 1998-03-01 1 8 1
1 2001-04-01 1 316 1
2 1998-06-01 2 26 1
3 2001-08-01 2 35 1
这篇关于取 pandas 中两个时间戳范围条件的交集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!