计算行之间的行数 [英] Counting number of rows between rows
问题描述
我有一个包含四列的数据框:user_id,事件和时间
I have a data frame with four columns: user_id, event, and time
User_id一个user_id,事件是 A或 B,并且时间是时间。我需要计算在每个 A值之前出现的 B值的数量。因此,如果在第一个 A之前出现3个 B值,则该 A实例将获得一个值为3的新列。如果在下一个 B值之前存在25个 B实例一个,那么它将得到25的值。我认为自己是一位可靠的R / dplyr熟练手,但这让我感到难过!
User_id a user_id, event is either "A" or "B", and time is time. I need to count the number of "B" values that occur before each "A" value. So if there are 3 "B" values that occur before the first "A" then that instance of "A" will get a new column with a value of 3. If there are 25 instances of "B" before the next values of "A" then that will get a value of 25. I consider myself a solid R/dplyr journeyman but this has me stumped! Thanks.
user_id event date_time desired_column
1 B 2018-01-01 NA
1 B 2018-01-02 NA
1 B 2018-01-03 NA
1 B 2018-01-04 NA
1 B 2018-01-05 NA
1 A 2018-01-06 5
1 B 2018-01-07 NA
1 A 2018-01-08 1
2 B 2018-01-05 NA
2 B 2018-01-06 NA
2 A 2018-01-07 2
2 B ... NA
2 A ... 1
推荐答案
x <- read.table(header=TRUE, stringsAsFactors=FALSE, text='
user_id event date_time desired_column
1 B 2018-01-01 NA
1 B 2018-01-02 NA
1 B 2018-01-03 NA
1 B 2018-01-04 NA
1 B 2018-01-05 NA
1 A 2018-01-06 5
1 B 2018-01-07 NA
1 A 2018-01-08 1
2 B 2018-01-05 NA
2 B 2018-01-06 NA
2 A 2018-01-07 2')
也许有点笨重,但是...
Perhaps a little clunky, but ...
(编辑:指定 dplyr :: lag
,因为 stats :: lag
不能满足我们的需要。)
(edit: specified dplyr::lag
, since stats::lag
doesn't do what we need.)
x$a <- NA
x$a[cumsum(rle(x$event)$lengths)] <- rle(x$event)$lengths
x$a <- dplyr::lag(x$a)
x$a[x$event == "B"] <- NA
x
# user_id event date_time desired_column a
# 1 1 B 2018-01-01 NA NA
# 2 1 B 2018-01-02 NA NA
# 3 1 B 2018-01-03 NA NA
# 4 1 B 2018-01-04 NA NA
# 5 1 B 2018-01-05 NA NA
# 6 1 A 2018-01-06 5 5
# 7 1 B 2018-01-07 NA NA
# 8 1 A 2018-01-08 1 1
# 9 2 B 2018-01-05 NA NA
# 10 2 B 2018-01-06 NA NA
# 11 2 A 2018-01-07 2 2
这篇关于计算行之间的行数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!