连续 NA 数 [英] Number of consecutive NA
问题描述
数据是这样的
subject x1 x2 x3 x4 x5 x6 x7
a 0.1 NA 0.2 0.1 0.1 NA 0.9
b NA NA -0.01 NA 0.3 0.8 0.01
c NA NA NA NA NA 0.9 0.4
d NA NA 0.01 NA NA NA 0.05
如何将新变量MAX 连续 NA 的数量"附加到此数据框?
How can I append new variable "the number of MAX consecutive NA" to this data.frame?
subject x1 x2 x3 x4 x5 x6 x7 NA_consecutive
a 0.1 NA 0.2 0.1 0.1 NA 0.9 1
b NA NA -0.01 NA 0.3 0.8 0.01 2 (max NA, not 1!!)
c NA NA NA NA NA 0.9 0.4 5
d NA NA 0.01 NA NA NA 0.05 3 (max NA, not 2!!)
我想按主题(即一行)计算连续 NA 的数量.简单地说,我尝试使用 duplicate
但它向我显示了任何重复的内容,包括正常值,而不是 NA.
I want to calculate the number of consecutive NA by subject(i.e, a row).
Simply, I try to use duplicate
but It shows me anything duplicated including normal value, not NA.
如果我将此数据集转换为long",df %>% gather(variable, value, -subject)
If I transform this data set to "long", df %>% gather(variable, value, -subject)
subject variable value
1 a x1 0.1
2 a x2 NA
3 a x3 0.2
4 a x4 0.1
5 a x5 0.1
6 a x6 NA
7 a x7 0.9
8 b x1 NA
9 b x2 NA
10 b x3 -0.01
..
这个表格更简单吗?
我不在乎任何形式的形式,我应该得到新的信息(MAX 连续不适用).
I don't care any shape of form, I should get new information (MAX consecutive NA).
如果可能,避免for循环"(但不是完全),因为这个数据非常大.
If possible, avoid "for loop"(but not completely) because this data is very large.
推荐答案
这里有一个 tidyverse
选项
df %>%
gather(k, v, -subject) %>%
arrange(subject, k) %>%
group_by(subject) %>%
mutate(grp = cumsum(c(0, abs(diff(!is.na(v))) == 1))) %>%
add_count(subject, grp) %>%
mutate(NA_consecutive = max(n[is.na(v)])) %>%
select(-grp, -n) %>%
spread(k, v)
## A tibble: 4 x 9
## Groups: subject [4]
# subject NA_consecutive x1 x2 x3 x4 x5 x6 x7
# <fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 a 1 0.100 NA 0.200 0.100 0.100 NA 0.900
#2 b 2 NA NA -0.0100 NA 0.300 0.800 0.0100
#3 c 5 NA NA NA NA NA 0.900 0.400
#4 d 3 NA NA 0.0100 NA NA NA 0.0500
这篇关于连续 NA 数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!