比较下一行,分组,data.table [英] Compare to next row, grouped, data.table
问题描述
我有一个包含每个用户每周浏览量的数据框。我想确定,对于每个用户,他们的意见是增加,减少,还是保持相同的某一事件后。我的数据如下所示:
I have a data frame containing number of page views per user, per week. I want to determine, for each user, whether their views increased, decreased, or stayed the same after a certain event. My data looks like this:
Userid week xeventinweek numviews
Alice 1 2 5
Alice 2 0 3
Alice 4 1 6
Bob 2 2 3
Bob 3 0 5
所以在这种情况下,爱丽丝的意见在她在第1周有2个事件后减少,她在第2周没有事件来衡量。鲍勃,但是,他的观点从3增加到5周,他有两个事件。
So in this case, Alice's views decreased after she had 2 events in week 1, and she had no events in week 2 to measure by. Bob, however, increased his views from 3 to 5 the week after he had two events.
我想获得一张表格,每周至少有一个活动的观看次数有所不同。所以它应该看起来像这样:
I would like to get a table with the difference in views for every week that had at least one event. So it should look something like this:
Userid week xeventinweek numviews numnextweek difference
Alice 1 2 5 3 -2
Alice 4 1 6 NA NA #the row for week 2 is missing because there were no events then for Alice
Bob 2 2 3 5 2
没有必要同时拥有numnextweek和difference列 - 无论是还是确定。
It is not essential to have both the numnextweek and difference columns - either or is ok.
我能够使用data.table和一个for循环,但是运行这么长时间是不可行的。我想到使用滚动连接,但它似乎不可能与分组数据(即,它需要单独完成每个Userid。)我如何使用data.table的本机功能?
I was able to do this using data.table and a for loop, but it took so long to run that it wasn't feasible. I thought of using a rolling join, but it doesn't seem possible with grouped data (i.e. it would need to be done individually for each Userid.) How can I do this using data.table's native functionality?
推荐答案
使用匹配
:
dat[, numnextweek := numviews[match(week + 1, week)] , by=Userid]
dat[, difference := numviews - numnextweek , by=Userid]
dat[xeventinweek != 0]
# Userid week xeventinweek numviews numnextweek difference
#1: Alice 1 2 5 3 2
#2: Alice 4 1 6 NA NA
#3: Bob 2 2 3 5 -2
这篇关于比较下一行,分组,data.table的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!