从 pandas 的第一个事件计算第n天 [英] Compute the nth day from the first event in Pandas
本文介绍了从 pandas 的第一个事件计算第n天的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有以下数据框,从原始数据框中分列,列为事件, unixtime 和日,我想添加第一个事件(第一次访问是第1天)以来第n天的另一列 arbday :
I have the following data frame, subsetted from my original data frame, with columns event, unixtime, and day, and I want to add another column arbday which is the nth day since the first event (with the first visit being day 1):
import numpy as np
import datetime as dt
>>> testdf = pd.DataFrame({'event': range(1,4), 'unixtime': [1346617885925, 1346961625305,1347214217566]},index=[343352,343353,343354])
>>> testdf['day'] = testdf['unixtime'].apply(lambda x: dt.datetime.utcfromtimestamp(x/1000).date())
event unixtime day arbday
343352 1 1346617885925 2012-09-02 1
343353 2 1346961625305 2012-09-06 5
343354 3 1347214217566 2012-09-09 8
环顾四周后,我尝试通过以下方式执行此操作:
After looking around, I tried to do this by:
>>> testdf2['arbday'] = np.where(testdf2['event']==1, 1, testdf2.day.apply(lambda x: x-x[:1]))
event unixtime day arbday
343352 1 1346617885925 2012-09-02 1
343353 2 1346961625305 2012-09-06 NaN
343354 3 1347214217566 2012-09-09 NaN
or
>>> testdf2['arbday'] = np.where(testdf2['event']==1, 1, testdf2.day.apply(lambda x: dt.timedelta(x-x[:1])))
TypeError: 'datetime.date' object is not subscriptable
正确的方法是什么?任何指针都非常感激!
What is the correct way to do this? Any pointer is much appreciated!
编辑:关于将这些应用于群组的后续问题是 here 。
EDIT: A follow-up question regarding to applying this over groups is here.
推荐答案
df = DataFrame({'event': range(1,4), 'unixtime': [1346617885925, 1346961625305,1347214217566]})
df['day'] = df['unixtime'].apply(lambda x: datetime.fromtimestamp(x/1000).date())
df['arbday']=df['day'].map(lambda x: (x-df.get_value(df[df.event == 1].first_valid_index(), 'day')).days+1)
print df
输出:
event unixtime day arbday
0 1 1346617885925 2012-09-02 1
1 2 1346961625305 2012-09-06 5
2 3 1347214217566 2012-09-09 8
这篇关于从 pandas 的第一个事件计算第n天的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文