计算1年内每个ID的行数 [英] Count number of rows for each ID within 1 year

查看:82
本文介绍了计算1年内每个ID的行数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个类似这样的熊猫数据框

I have a pandas dataframe something like this

Date        ID
01/01/2016  a
05/01/2016  a
10/05/2017  a
05/05/2014  b
07/09/2014  b
12/08/2017  b

我需要做的是添加一列,该列显示上一年内每个ID的条目数,另一列显示下一年内的ID数.我写了一些可怕的代码,遍历整个数据帧(数百万行)并进行了计算,但是必须有更好的方法!

What I need to do is to add a column which shows the number of entries for each ID that occurred within the last year and another column showing the number within the next year. I've written some horrible code that iterates through the whole dataframe (millions of lines) and does the computations but there must be a better way!

推荐答案

我认为您需要

I think you need between with boolean indexing for filter first and then groupby and aggregate size.

concat 的输出和添加 reindex 来添加缺少的行由0填充:

print (df)
         Date ID
0  01/01/2016  a
1  05/01/2016  a
2  10/05/2017  a
3  05/05/2018  b
4  07/09/2014  b
5  07/09/2014  c
6  12/08/2018  b


#convert to datetime (if first number is day, add parameter dayfirst)
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
now = pd.datetime.today()
print (now)

oneyarbeforenow =  now - pd.offsets.DateOffset(years=1)
oneyarafternow =  now + pd.offsets.DateOffset(years=1)

#first filter
a = df[df['Date'].between(oneyarbeforenow, now)].groupby('ID').size()
b = df[df['Date'].between(now, oneyarafternow)].groupby('ID').size()
print (a)
ID
a    1
dtype: int64

print (b)
ID
b    2
dtype: int64

df1 = pd.concat([a,b],axis=1).fillna(0).astype(int).reindex(df['ID'].unique(),fill_value=0)
print (df1)
   0  1
a  1  0
b  0  2
c  0  0

如果需要按日期对每个日期进行比较,请添加或减去每组year offset 需要具有条件和sum Trues的自定义功能:

If need compare each date by first date add or subtract year offset per group need custom function with condition and sum Trues:

offs = pd.offsets.DateOffset(years=1)

f = lambda x: pd.Series([(x > x.iat[-1] - offs).sum(), \
                        (x < x.iat[-1] + offs).sum()], index=['last','next'])
df = df.groupby('ID')['Date'].apply(f).unstack(fill_value=0).reset_index()
print (df)
  ID  last  next
0  a     1     3
1  b     3     2
2  c     1     1

这篇关于计算1年内每个ID的行数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆