如何在另一个列上有条件地按组执行列的连续计数 [英] How to Perform Consecutive Counts of Column by Group Conditionally Upon Another Column

查看:82
本文介绍了如何在另一个列上有条件地按组执行列的连续计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从按PatientID列分组的Noshow列中获取连续计数。我使用的以下代码与我希望获得的结果非常接近。但是,使用sum函数会返回整个组的总和。我希望sum函数只求和当前行,并且仅求和上面具有 1的行。基本上,我试图统计患者未连续显示每一行的约会次数,然后在显示时重置为0。似乎只需要对以下代码进行一些调整。但是,我似乎找不到此站点上任何地方的答案。

I'm trying to get consecutive counts from the Noshow column grouped by the PatientID column. The below code that I am using is very close to the results that I wish to attain. However, using the sum function returns the sum of the whole group. I would like the sum function to only sum the current row and only the rows that have a '1' above it. Basically, I'm trying to count the consecutive amount of times a patient noshows their appointment for each row and then reset to 0 when they do show. It seems like only some tweaks need to be made to my below code. However, I cannot seem to find the answer anywhere on this site.

transform(df, ConsecNoshows = ifelse(Noshow == 0, 0, ave(Noshow, PatientID, FUN = sum)))

上面的代码产生以下内容输出:

The above code produces the below output:

#Source: local data frame [12 x 3]
#Groups: ID [2]
#
#   PatientID Noshow ConsecNoshows
#       <int>  <int>         <int>   
#1          1      0             0
#2          1      1             4
#3          1      0             0
#4          1      1             4
#5          1      1             4
#6          1      1             4
#7          2      0             0
#8          2      0             0
#9          2      1             3
#10         2      1             3
#11         2      0             0
#12         2      1             3

这就是我想要的:

#Source: local data frame [12 x 3]
#Groups: ID [2]
#
#   PatientID Noshow ConsecNoshows
#       <int>  <int>         <int>   
#1          1      0             0
#2          1      1             0
#3          1      0             1
#4          1      1             0
#5          1      1             1
#6          1      1             2
#7          2      0             0
#8          2      0             0
#9          2      1             0
#10         2      1             1
#11         2      0             2
#12         2      1             0

[更新]我希望连续计数被抵消

[UPDATE] I would like the consecutive count to be offset by one row down.

感谢您可以提前提供的任何帮助!

Thank you for any help you can offer in advance!

推荐答案

这是另一个(类似) data.table 方法

And here's another (similar) data.table approach

library(data.table)
setDT(df)[, ConsecNoshows := seq(.N) * Noshow, by = .(PatientID, rleid(Noshow))]
df
#     PatientID Noshow ConsecNoshows
#  1:         1      0             0
#  2:         1      1             1
#  3:         1      0             0
#  4:         1      1             1
#  5:         1      1             2
#  6:         1      1             3
#  7:         2      0             0
#  8:         2      0             0
#  9:         2      1             1
# 10:         2      1             2
# 11:         2      0             0
# 12:         2      1             1

这基本上是由 PatientID 和 Noshow 的游程长度编码,并使用组大小创建序列,同时按顺序乘以 Noshow 仅保留 Noshow == 1

This is basically groups by PatientID and "run-length-encoding" of Noshow and creates sequences using the group sizes while multiplying by Noshow in order to keep only the values when Noshow == 1

这篇关于如何在另一个列上有条件地按组执行列的连续计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆