如何在另一个列上有条件地按组执行列的连续计数 [英] How to Perform Consecutive Counts of Column by Group Conditionally Upon Another Column
问题描述
我正在尝试从按PatientID列分组的Noshow列中获取连续计数。我使用的以下代码与我希望获得的结果非常接近。但是,使用sum函数会返回整个组的总和。我希望sum函数只求和当前行,并且仅求和上面具有 1的行。基本上,我试图统计患者未连续显示每一行的约会次数,然后在显示时重置为0。似乎只需要对以下代码进行一些调整。但是,我似乎找不到此站点上任何地方的答案。
I'm trying to get consecutive counts from the Noshow column grouped by the PatientID column. The below code that I am using is very close to the results that I wish to attain. However, using the sum function returns the sum of the whole group. I would like the sum function to only sum the current row and only the rows that have a '1' above it. Basically, I'm trying to count the consecutive amount of times a patient noshows their appointment for each row and then reset to 0 when they do show. It seems like only some tweaks need to be made to my below code. However, I cannot seem to find the answer anywhere on this site.
transform(df, ConsecNoshows = ifelse(Noshow == 0, 0, ave(Noshow, PatientID, FUN = sum)))
上面的代码产生以下内容输出:
The above code produces the below output:
#Source: local data frame [12 x 3]
#Groups: ID [2]
#
# PatientID Noshow ConsecNoshows
# <int> <int> <int>
#1 1 0 0
#2 1 1 4
#3 1 0 0
#4 1 1 4
#5 1 1 4
#6 1 1 4
#7 2 0 0
#8 2 0 0
#9 2 1 3
#10 2 1 3
#11 2 0 0
#12 2 1 3
这就是我想要的:
#Source: local data frame [12 x 3]
#Groups: ID [2]
#
# PatientID Noshow ConsecNoshows
# <int> <int> <int>
#1 1 0 0
#2 1 1 0
#3 1 0 1
#4 1 1 0
#5 1 1 1
#6 1 1 2
#7 2 0 0
#8 2 0 0
#9 2 1 0
#10 2 1 1
#11 2 0 2
#12 2 1 0
[更新]我希望连续计数被抵消
[UPDATE] I would like the consecutive count to be offset by one row down.
感谢您可以提前提供的任何帮助!
Thank you for any help you can offer in advance!
推荐答案
这是另一个(类似) data.table
方法
And here's another (similar) data.table
approach
library(data.table)
setDT(df)[, ConsecNoshows := seq(.N) * Noshow, by = .(PatientID, rleid(Noshow))]
df
# PatientID Noshow ConsecNoshows
# 1: 1 0 0
# 2: 1 1 1
# 3: 1 0 0
# 4: 1 1 1
# 5: 1 1 2
# 6: 1 1 3
# 7: 2 0 0
# 8: 2 0 0
# 9: 2 1 1
# 10: 2 1 2
# 11: 2 0 0
# 12: 2 1 1
这基本上是由 PatientID
和 Noshow
的游程长度编码,并使用组大小创建序列,同时按顺序乘以 Noshow
仅保留 Noshow == 1
This is basically groups by PatientID
and "run-length-encoding" of Noshow
and creates sequences using the group sizes while multiplying by Noshow
in order to keep only the values when Noshow == 1
这篇关于如何在另一个列上有条件地按组执行列的连续计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!