R:用另一个数据帧更新数据帧 [英] R: Updating a data frame with another data frame
本文介绍了R:用另一个数据帧更新数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
df1 = data.frame(Index = c(1:6 ),A = c(1:6),B = c(1,2,3,NA,NA,NA),C = c(1,2,3,NA,NA,NA))
> df1
指数ABC
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 NA NA
5 5 5 NA NA
6 6 6 NA NA
另一个数据框包含col B和C
df2 = data.frame(Index = c(4,5,6),B = c(4,4, 4),C = c(5,5,5))
> df2
索引BC
1 4 4 5
2 5 4 5
3 6 4 5
如何更新df1中缺少的值,如下所示:
索引ABC
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 5
5 5 5 4 5
6 6 6 4 5
我的尝试:
library(dplyr)
> full_join(df1,df2)
加入者:c(索引,B,C)
索引ABC
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 NA NA
5 5 5 NA NA
6 6 6 NA NA
7 4 NA 4 5
8 5 NA 4 5
9 6 NA 4 5
您可以看到已经创建对于4,5,6指数而言,重复行,而不是替换NA值。
任何帮助将不胜感激!
解决方案
合并
然后聚合
:
aggregate(。〜Index,data = merge(df1,df2,all = TRUE),na.omit,na.action = na.pass)
#索引BCA
#1 1 1 1 1
#2 2 2 2 2
#3 3 3 3 3
#4 4 4 5 4
#5 5 4 5 5
#6 6 4 5 6
或在 dplyr
中说:
df1%>%
full_join(df2)%>%
group_by(Index)%>%
($)
#Joining by:c(Index,B,C)
#Source:本地数据帧[6 x 4]
#
#索引ABC
#(dbl)(int)(dbl)(dbl)
#1 1 1 1 1
#2 2 2 2 2
#3 3 3 3 3
#4 4 4 4 5
#5 5 5 4 5
#6 6 6 4 5
Let's say our initial data frame looks like this:
df1 = data.frame(Index=c(1:6),A=c(1:6),B=c(1,2,3,NA,NA,NA),C=c(1,2,3,NA,NA,NA))
> df1
Index A B C
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 NA NA
5 5 5 NA NA
6 6 6 NA NA
Another data frame contains new information for col B and C
df2 = data.frame(Index=c(4,5,6),B=c(4,4,4),C=c(5,5,5))
> df2
Index B C
1 4 4 5
2 5 4 5
3 6 4 5
How can you update the missing values in df1 so it looks like this:
Index A B C
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 5
5 5 5 4 5
6 6 6 4 5
My attempt:
library(dplyr)
> full_join(df1,df2)
Joining by: c("Index", "B", "C")
Index A B C
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 NA NA
5 5 5 NA NA
6 6 6 NA NA
7 4 NA 4 5
8 5 NA 4 5
9 6 NA 4 5
Which as you can see has created duplicate rows for the 4,5,6 index instead of replacing the NA values.
Any help would be greatly appreciated!
解决方案
merge
then aggregate
:
aggregate(. ~ Index, data=merge(df1, df2, all=TRUE), na.omit, na.action=na.pass )
# Index B C A
#1 1 1 1 1
#2 2 2 2 2
#3 3 3 3 3
#4 4 4 5 4
#5 5 4 5 5
#6 6 4 5 6
Or in dplyr
speak:
df1 %>%
full_join(df2) %>%
group_by(Index) %>%
summarise_each(funs(na.omit))
#Joining by: c("Index", "B", "C")
#Source: local data frame [6 x 4]
#
# Index A B C
# (dbl) (int) (dbl) (dbl)
#1 1 1 1 1
#2 2 2 2 2
#3 3 3 3 3
#4 4 4 4 5
#5 5 5 4 5
#6 6 6 4 5
这篇关于R:用另一个数据帧更新数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文