将NA值替换为组值 [英] replace NA value with the group value

查看:56
本文介绍了将NA值替换为组值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个如下的df,它在5个家庭中有20个人.家庭中的某些人缺少有关是否拥有医疗卡的数据.我想给这些人和他们家庭中的其他人相同的值(而不是NA值,即为0或1的实际二进制值).

I have a df as follows which has 20 people across 5 households. Some people within the household have missing data for whether they have a med_card or not. I want to give these people the same value as the other people in their household (not an NA value, a real binary value which is either 0 or 1).

我尝试了以下代码,这是我认为朝正确方向迈出的一步-但并非100%正确,因为a)如果每个家庭的med_card的第一个值是NA,则b)无效并不能取代1号家庭中的所有人.

I have tried the following code, which is a step in the right direction I think - but isn't 100% correct because a) it doesn't work if the first value for med_card per household is NA and b) it doesn't replace NA for all people in household 1.

DF<- ddply(df, .(hhold_no), function(df) {df$med_card[is.na(df$med_card)] <- head(df$med_card, na.rm=TRUE); return(df)})

任何指针将不胜感激, 谢谢

Any pointers would be greatly appreciated, Thank you

样本df

df
   person_id hhold_no med_card
1          1        1        1
2          2        1        1
3          3        1       NA
4          4        1       NA
5          5        1       NA
6          6        2        0
7          7        2        0
8          8        2        0
9          9        2        0
10        10        3       NA
11        11        3       NA
12        12        3       NA
13        13        3        1
14        14        3        1
15        15        4        1
16        16        4        1
17        17        5        1
18        18        5        1
19        19        5       NA
20        20        5       NA

和要编写的代码

person_id<-as.numeric(c(1:20))
hhold_no<-as.numeric(c(1,1,1,1,1,2,2,2,2,3,3,3,3,3,4,4,5,5,5,5))
med_card<-as.numeric(c(1,1,NA,NA,NA,0,0,0,0,NA,NA,NA,1,1,1,1,1,1,NA,NA))
df<-data.frame(person_id,hhold_no, med_card)

所需的输出

df
   person_id hhold_no med_card med_card_new
1          1        1        1            1
2          2        1        1            1
3          3        1       NA            1
4          4        1       NA            1
5          5        1       NA            1
6          6        2        0            0
7          7        2        0            0
8          8        2        0            0
9          9        2        0            0
10        10        3       NA            1
11        11        3       NA            1
12        12        3       NA            1
13        13        3        1            1
14        14        3        1            1
15        15        4        1            1
16        16        4        1            1
17        17        5        1            1
18        18        5        1            1
19        19        5       NA            1
20        20        5       NA            1

推荐答案

尝试ave.它将功能应用于组.请查看?ave了解详细信息,例如:

Try ave. It applies a function to groups. Have a look at ?ave for details, e.g.:

df$med_card_new <- ave(df$med_card, df$hhold_no, FUN=function(x)unique(x[!is.na(x)]))

#   person_id hhold_no med_card med_card_new
#1          1        1        1            1
#2          2        1        1            1
#3          3        1       NA            1
#4          4        1       NA            1
#5          5        1       NA            1
#6          6        2        0            0
#7          7        2        0            0
#8          8        2        0            0
#9          9        2        0            0

请注意,这仅在并非一个家庭中的所有值都为NA且两个值的取值不应相同的情况下才有效(例如,人1 == 1,人2 == 0).

Please note that this will only work if not all values in a household are NA and the should not differ (e.g. person 1 == 1, person 2 == 0).

这篇关于将NA值替换为组值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆