从每个先前值计数行 [英] count rows from every previous value
问题描述
此问题与此处相同,但是这次我想将每个之前的计数值,而不是本身。因此,对于第一个值(1500),我们将得到NA,因为在此之前没有其他值。然后,将1100除以4,因为先前值(1500)的计数为4。然后,将200除以3,因为先前值(1100)的计数为3。最后,将1100除以2,因为200的计数为2我尝试使用shift / lag,但无法成功!
This question is the same as here but this time I want to divide every value by the previous count, not itself. So, for the first value (1500) we will have NA because there is no other value before that. Then, we will divide 1100 by 4 because the count of previous value (1500) is 4. Then, we will divide 200 by 3 because the previous value (1100) has count 3. Last, divide 1100 by 2 because 200 has count 2. I tried to use shift/lag but can't succeed!
这是将每个值除以自己的计数的代码。
This is the code that divides every value with its own count.
library(dplyr)
library(tidyverse)
df <- tibble(mydate = as.Date(c("2019-05-11 23:01:00", "2019-05-11 23:02:00", "2019-05-11 23:03:00", "2019-05-11 23:04:00",
"2019-05-12 23:05:00", "2019-05-12 23:06:00", "2019-05-12 23:07:00", "2019-05-12 23:08:00",
"2019-05-13 23:09:00", "2019-05-13 23:10:00", "2019-05-13 23:11:00", "2019-05-13 23:12:00",
"2019-05-14 23:13:00", "2019-05-14 23:14:00", "2019-05-14 23:15:00", "2019-05-14 23:16:00",
"2019-05-15 23:17:00", "2019-05-15 23:18:00", "2019-05-15 23:19:00", "2019-05-15 23:20:00")),
myval = c(0, NA, 1500, 1500,
1500, 1500, NA, 0,
0, 0, 1100, 1100,
1100, 0, 200, 200,
1100, 1100, 1100, 0
))
# just replace values [0,1] with NA
df$myval[df$myval >= 0 & df$myval <= 1] <- NA
df <- df %>%
group_by(grp = data.table::rleid(myval)) %>%
mutate(counts = n(),
result= myval/counts)
# mydate myval grp counts result
# <date> <dbl> <int> <int> <dbl>
# 1 2019-05-11 NA 1 2 NA
# 2 2019-05-11 NA 1 2 NA
# 3 2019-05-11 1500 2 4 375
# 4 2019-05-11 1500 2 4 375
# 5 2019-05-12 1500 2 4 375
# 6 2019-05-12 1500 2 4 375
# 7 2019-05-12 NA 3 4 NA
# 8 2019-05-12 NA 3 4 NA
# 9 2019-05-13 NA 3 4 NA
#10 2019-05-13 NA 3 4 NA
#11 2019-05-13 1100 4 3 367.
#12 2019-05-13 1100 4 3 367.
#13 2019-05-14 1100 4 3 367.
#14 2019-05-14 NA 5 1 NA
#15 2019-05-14 200 6 2 100
#16 2019-05-14 200 6 2 100
#17 2019-05-15 1100 7 3 367.
#18 2019-05-15 1100 7 3 367.
#19 2019-05-15 1100 7 3 367.
#20 2019-05-15 NA 8 1 NA
我想保留上述数据框,并使用数据es列和正确的结果。
I want to preserve the above dataframe, with the dates column and the correct result.
推荐答案
这是一种方法:
library(dplyr)
#Create a group number
df1 <- df %>% mutate(grp = data.table::rleid(myval))
df1 %>%
#Keep only non-NA value
filter(!is.na(myval)) %>%
#count occurence of each grp
count(grp, name = 'count') %>%
#Shift the count to the previous group
mutate(count = lag(count)) %>%
#Join with the original data
right_join(df1, by = 'grp') %>%
#divide the count to get final result
mutate(result = myval/count) %>%
arrange(grp)
返回
# A tibble: 20 x 5
# grp count mydate myval result
# <int> <int> <date> <dbl> <dbl>
# 1 1 NA 2019-05-11 NA NA
# 2 1 NA 2019-05-11 NA NA
# 3 2 NA 2019-05-11 1500 NA
# 4 2 NA 2019-05-11 1500 NA
# 5 2 NA 2019-05-12 1500 NA
# 6 2 NA 2019-05-12 1500 NA
# 7 3 NA 2019-05-12 NA NA
# 8 3 NA 2019-05-12 NA NA
# 9 3 NA 2019-05-13 NA NA
#10 3 NA 2019-05-13 NA NA
#11 4 4 2019-05-13 1100 275
#12 4 4 2019-05-13 1100 275
#13 4 4 2019-05-14 1100 275
#14 5 NA 2019-05-14 NA NA
#15 6 3 2019-05-14 200 66.7
#16 6 3 2019-05-14 200 66.7
#17 7 2 2019-05-15 1100 550
#18 7 2 2019-05-15 1100 550
#19 7 2 2019-05-15 1100 550
#20 8 NA 2019-05-15 NA NA
这篇关于从每个先前值计数行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!